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ABSTRACT 

o 

The XMM Cluster Survey (XCS) is a serendipitous search for galaxy clusters us- 
ing all publicly available data in the XMM-Newton Science Archive. Its main aims are 

• to measure cosmological parameters and trace the evolution of X-ray scaling relations. 

/\ In this paper we describe the data processing methodology applied to the 5,776 XMM 

observations used to construct the current XCS source catalogue. A total of 3,675 
> 4-er cluster candidates with >50 background-subtracted X-ray counts are extracted 
from a total non-overlapping area suitable for cluster searching of 410 deg 2 . Of these, 
993 candidates are detected with >300 background-subtracted X-ray photon counts, 
and we demonstrate that robust temperature measurements can be obtained down 
to this count limit. We describe in detail the automated pipelines used to perform 
the spectral and surface brightness fitting for these candidates, as well as to estimate 
redshifts from the X-ray data alone. A total of 587 (122) X-ray temperatures to a 
typical accuracy of <40 (<10) per cent have been measured to date. We also present 
the methodology adopted for determining the selection function of the survey, and 
show that the extended source detection algorithm is robust to a range of cluster mor- 
phologies by inserting mock clusters derived from hydrodynamical simulations into 
real AMMimages. These tests show that the simple isothermal /3-profiles is sufficient 
to capture the essential details of the cluster population detected in the archival XMM 
observations. The redshift follow-up of the XCS cluster sample is presented in a com- 
panion paper, together with a first data release of 503 optically-confirmed clusters. 

Key words: X-rays: galaxies: clusters — galaxies: clusters: intracluster medium — 
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1 INTRODUCTION 

Clusters of galaxies are massive objects (1O 13 - 5_15 M ) com- 
posed of galaxies, hot ionised gas and dark matter. The grav- 
itational potential is dominated by dark matter, with the 
mass ratio of the three components being roughly 3:10:87 
respectively, although with a strong mass dependence in the 
ratio of gas to stars ( jGonzalez et al.| [2b07). Clusters pro- 
vide us with the opportunity to obtain information about 
the underlying cosmological model and important insights 
into the processes that govern structure formation (see | Voit| 
2005||Allen et al.|2011| for reviews). 



While detailed studies of individual clusters are ex- 
tremely important, especially for obtaining insight into the 
small-scale processes that influence the evolution of their 
baryonic components, a full understanding of the complex 
nature of cluster formation and evolution requires the study 
of the galaxy cluster population as a whole. This is best 
achieved, in practice, by undertaking cluster surveys. The 
first large cluster surveys were carried out via eye-ball 
searches for galaxy over-densities on optical photographic 
plates (Abell 1958; Zwic kyet al.|1968 l, but, nowadays, clus- 
ter finding uses sophisticated automated techniques. 

In this paper we describe automated cluster finding 
at X-ray wavelengths; the hot ionised gas (or intracluster 
medium/ICM) emits soft X-ray radiation in proportion to 
the square of the electron density. However, this is not the 
only way new clusters are being discovered. For example, the 
effect of cluster sized gravitational potentials can be seen in 
the optical/infra-red, via strong or weak gravitational lens- 
ing (e.g. |Wittman et al.||2003 l. Increasing numbers of clus- 
ters are also being discovered at millimetre wavelengths (e.g. 



Staniszewski et al. 2009; Vandcrlindc et al. 2010 


Menantcau 


et al. 


2010 I 


Vfarriage et al.|2010||Williamson et a 


-|2011l Ade 


et al. 


2011 


Foley et al.||2011 


using the Sunyaei, 


r -Zel'dovich 
erse Comp- 


(SZ) effect 


Sunyaev & Zeldovich|1972 1: the inv 



ton scattering of photons from the cosmic microwave back- 
ground (CMB) by the hot ICM. At longer wavelengths still, 
one can discover clusters out to high redshift using radio 
telescopes, via the unusual signature of head-tail galaxies 
(Blanton et al. 20031. Due to the advent of large format 



CCD detectors, cluster finding using galaxy over-densities is 



also currently undergoing a renaissance (e.g. Gladders & Yee 
|Miller et all|2005l |Koester et aLl|2007| |Wilson et al 



2000 



2009) 



Cluster surveys have already revolutionised our under- 



standing of the physics of the ICM (e.g. Ponman et al 



1999 



Arnaud et al. 2010 1 and delivered cosmological con- 
straints independent of, and competitive with, those de- 
rived from observations of the CMB (e.g. Larson et al.|2010 



Dunkley et"aT1|2010[ ) and Type la sup ernovae (e.g. |Kessler 



et al.|2009[ ). When combined with these other cosmological 



probes, clusters are playing an important role in the quest to 



understand the nature of dark energy (e.g. Vikhlinin et al 
|2009b||Mantz et al.|2010||Rozo et al.|20T0l|Sehgal et al.|2011 
see ISahlen et al. 120091 for a review of earlier cluster cosmol- 
ogy studies dating back to |Frenk et al.||1990l and |Oukbir fe[ 
Blanchard 1992). Clusters are also being used to test gen- 
eral relativity on large scales (e.g. Rapetti et al.|20To|, con - 
strain the properties of neutrinos (e.g. Mantz et al. [2010), 
and search for evidence of non-Gaussian primordial density 
fluctuations (e.g. Hoyle et al.||2010 i. Future cluster surveys 



will be wider, more sensitive and better calibrated than ever 
before, and so are sure to deliver significantly improved con- 



straints compared to these existing works (e.g. |Predehl et al. 
2006l|Majumdar fc Mohr|2004| [Cunha et al.|2009||Wu et ~ 



2010) 



In this paper we present the XMM Cluster Survey 
(XCS), a search for serendipitous galaxy clusters in archival 
XMM-Newton observations. The original XCS concept and 
motivation is described in Romer et al. (20011. The main 



goals of the survey are (i) to measure cosmological parame- 
ters, (ii) to measure the evolution of the X-ray luminosity- 
temperature scaling relation (Lx — Tx relation, hereafter), 
(lii) to study galaxy properties in clusters to high redshift, 
and (iv) to provide the community with a high quality, ho- 
mogeneously selected X-ray cluster sample. The XCS fol- 
lows a rich tradition of X-ray cluster surveys dating back al- 



most 30 years using earlier satellites: Piccinotti et al. ( 1982 
HE AO I), Gioia et al. ( 1990 Einstein ), and several derived 
from the ROSAT All Sky Survey (RASS;|Ebeling et al 



|Bohringer et al.|2000 



2002; Cruddace et al. 



Ebeling et al. 2000; Ebeling et al. 



1998 



2001 



2002 |Gioia et a l. 2003; Bohri nger etaL] 



tions archive (Rosati et al.|1998 |Romer et al. 


2000 


Perlman 


et al.||2002| |Mullis et al.| 2003| |Burke et al.| 


2003 


Burenin 


et al.|2007| Horner et al.|2008 1. 



The XCS is not the only project currently exploiting 
the XMM-Newton (XMM hereafter) archive for new detec- 



tions of clusters. Other projects include: XDCP ( Mullis et al 



2005) jFassbender et al.|[2008| [Santos et al.||2009| |Schwope 



20111 



et al.||2010| jFassbender et al.||2010| |Suhada et al 
XMM-LSS ( jPierre et al.||2006| [Bremer et al. ||2006| |P;u ;<u<l 



ct al.[[2007 



Adami et al. 20111; SEXCLAS ( Kolokotronis 



et al.|2006|); COSMO S ( |Finoguenov et al.|2007|; XMM -BSC 



(Suhada et al. 20101; SXDS (Finoguenov et al. 20101; and 



one being carried out by members of the XMM Survey Sci- 
ence Center (Schwope et al.|2004||Lamer et al.|2008a|). This 



intense international interest stems from the fact that XMM 
has several features advantageous to cluster searching: in 
essence it combines sensitivity, and a large field of view, 
with spectral imaging capabilities. 

The XMM image quality does not match that of Chan- 
dra, but it is still good enough to allow one to differenti- 
ate between point-like and extended sources over the whole 
field of view: given that clusters dominate the extended X- 
ray source population, this then allows us to identify cluster 
candidates efficiently, despite the fact that clusters only com- 
prise ~10% of the total X-ray source population. Moreover, 
the spectral capabilities of XMM allow the measurement of 
the temperature of the hot ICM directly from the discov- 
ery data. These Tx measurements allow us to then estimate 
cluster masses, something of vital importance to cosmolog- 
ical studies. Finally, the mission has been in operation for 
over 10 years, and has built up a large archive of observations 
distributed across the sky. By now there are several hundred 
square degrees available that are suitable for a serendipitous 
cluster survey, already exceeding that of the largest deep 
ROSAT survey ( Burenin et al.|2007 l. Serendipitous cluster 
surveys have also been conducted using the Chandra archive 
(e.g. Barkhouse et al.|2006 l, although the available area for 
cluster searching is significantly smaller in comparison to the 
XMM archive. 



As predicted in Romer et al. (20011, and now demon- 
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strated below, XCS will deliver the largest number of clus- 
ter temperature measurements to date. Importantly, these 
clusters will form a homogeneous sample (both in terms of 
selection and analysis) and have a well-understood selec- 



tion function. In a companion paper (Mehrtens et al. 2011 



Mil hereafter), we present our first data release (XCS-DR1) 
and this includes 402 Tx measurements. By comparison, 
the largest previous compilations of Tx values from homo- 
geneous samples contain less than 100 clusters each, e.g. 
Reiprich & B6hringer| ((2002| 63 clusters), |Henry| pOOl} 25 



clusters), Pacaud et al. ( 2007 



31 clusters), Vikhlinin et al. (2009a 85 clusters), and|Mantz 



29 clusters), Pratt et al.|( 2009 



et al. ( 2010 96 clusters). Larger compilations of clusters with 



heterogeneous selection do exist, and some have significantly 
better per cluster Tx precision than XCS, but even so the 



largest published collection is still only 115 strong (Maughan 



|2007| a larger sample, of 273 low-redshift clusters, was put 
together by |Horner|2001 but was not made public). 

XCS highlights to date include the detection and sub- 
sequent multi-wavelength follow-up of a z = 1.46 cluster 
(XMMXCS J2215 . 9-1738; [Stanford et al.|2006| [Hilton et al. 
[20(37] [2009} [20101 , which for several years held the record 
for the highest redshift spectroscopically confirmed cluster 
(recent discoveries of higher redshift X-ray clusters include 



Tanaka et al.|2010|[Papovich et al.|2010| [Henry et al.|2010[ 



Gobat et al.|2011 1. XCS clusters have also been used in com- 
pilation studies of galaxy evolution in high redshift clusters 
( |Collins et al.|2009|[Stott et al.|2010[ ). Conservative forecasts 
of the performance of XCS for cosmological parameter esti- 
mation and cluster scaling relations can be found in Sahlcn 
et al.| ( |2009[ > : we expect to measure (at 1-a and from clus- 
ters alone, i.e. not in combination with CMB or supernovae 
observations) fi m to ±0.03 (and Qa to the same accuracy 
assuming flatness), and ug to ±0.05, whilst also constraining 
the normalisation and slope of the Lx — Tx relation to ±6 
and ±13 per cent, respectively. 

In this paper, we present an overview of the XCS data 
analysis strategy, from acquiring the data to producing a 
catalogue. A schematic of our approach is shown in Fig. [I] 
although note that components indicated with dashed out- 
lines are discussed in Mil. The paper is broken up into 3 
main sections. Section [2] describes data acquisition, reduc- 
tion and image generation. Section [3] describes source de- 
tection, the compilation of candidate lists, and simulations 
of the survey selection function. Section [4] describes how we 
use XMM data to measure X-ray redshifts, temperatures 
and luminosities for the candidates. 



2 XMM DATA REDUCTION 

The XMM archive contains thousands of public observations 
suitable for conducting the XMM Cluster Survey. Such a 
large volume of data means we have to carry out most of the 
XCS in a fully automated manner - the only parts that are 



not automated are the mask making (Section 2.4.1 1, optical 



follow-up, and quality control (Mil). While this automation 
presents a number of challenges, in terms of handling the 
variety and complexity of the archival data, it also has a 
number of benefits: not only has the entire data set been 
treated in a consistent and systematic way, but we are also 
able to run realistic simulations of our selection function. 



In this section we describe how the raw XMM archive 
is manipulated into science-grade image files. First the data 
are downloaded from the remote storage facility at the Eu- 
ropean Space Astronomy Centre (ESAC) near Madrid to 
the University of Sussex (Section 2.2|. Then the data are 
calibrated and cleaned of periods of high background con- 



tamination (Section 2.3 1. Next, images are produced (Sec- 
2.4| and flux conversion factors calculated (Section |2.5[ ). 



tion 

We begin this section with an overview of some of the salient 
features of the XMM mission. 



2.1 The XMM-Newton Mission 



The XMM mission ( Ja nsen fc Laine|1997 1 consists of three 
co-aligned Wolter Type I ( |Wolter|1952b|a I X-ray telescopes 
mounted on the same spacecraft. The mission was under- 
taken by the European Space Agency (ESA) and the space- 
craft was launched on 10th December, 1999. The mission 
configuration, with three separate telescopes simultaneously 
illuminating three cameras, means that most exposures gen- 
erate data with potential for serendipitous cluster finding: 
by comparison Chandra (Wcisskopf 1999), has a single tele- 
scope that illuminates only one of several instruments at 
any given time, and not all those instruments are suitable 
for cluster finding. 



The European Photon Imaging Camera (EPIC: Villa 
et al.|[l996 1 consists of three separate cameras, each in the 
focal plane of a separate X-ray telescope. Each camera con- 



sists of an array of charge-coupled devices (CCDs: Boyle & 



Smith 1970 1 in different configurations. Two cameras, the 



EPIC-mosl and 2, consist of arrays of 7 metal oxide semi- 
conductor CCDs illuminated by 44% of the light from their 
respective telescopes (the rest is redirected to the Reflection 
Grating Spectrometers). The EPIC-pn camera consists of 12 
back-illuminated CCDs. These CCDs are not only more sen- 
sitive than those in the EPIC-mos cameras, but the EPIC-pn 
receives all the light from its respective telescope. Thus, the 
EPIC-pn camera has more than twice the sensitivity of the 
EPIC-mos cameras. 

One disappointing aspect of both XMM and Chandra 
has been the unexpectedly high background in their CCD 
cameras. Both these missions are in similar, highly-elliptical 
orbits, and it was only after their launch that it was realised 
that these orbits intersect a population of low-energy pro- 
tons trapped in the Earth's magnetosphere. The lower en- 
ergy protons can be funnelled by the grazing incidence mir- 
rors onto the detectors and this has resulted in a significantly 
higher background than was expected before launch. Con- 
sequently, certain aspects of XCS have proved to be more 
challenging than was anticipated in our pre-launch predic- 
tions ( |Romer et al.|200lj ) . In addition to the enhanced back- 
ground, there have been a number of incidents of damage to 
the EPIC cameras while in orbit, but in only one case has 



this resulted in a significant loss of detector area (Abbey 
et al.|2006 l. 



2.1.1 XMM-Newton Point- Spread Function 

A crucial issue for the detection of extended sources by XCS 
is the treatment of the XMM Point-Spread Function (PSF). 
The PSF is a strong function of off-axis angle and photon 
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Figure 1. Flowchart showing an overview of the XCS analysis methodology. This illustrates the sequence by which data from the XMM 
archive is used to create a catalogue of galaxy clusters. 



energy (where off-axis angle is the angle between the source 
location and the centre of the field of view). As the off-axis 
angle increases, the PSF shape morphs from being circu- 
larly symmetric to ellipsoidal and finally bow-tie shaped. 
There have been a number of attempts to characterise the 
XMM PSF including: simulations based on measurements 
of the shape of the mirrors (Gondoin et al. 119961); mea- 



surements taken on the ground by passing X-ray beams 
from synchrotron sources through XMM mirror modules 
( Stockman et al.||1997| |Gondoin et al.||1998| >; and fitting 1- 
dimensional profiles to observations of bright X-ray sources 



(Gondoin et al. 2000 Ghizzardi 2001 2002 Read 2004) 



Unfortunately, thus far, this has not resulted in a complete 
and reliable characterisation of the XMM PSF. Currently 
four PSF models are available: the Low, Medium, High and 
Extended Accuracy Models ( Altieri et alj [2004). Of these, 



only the Medium Accuracy Model (MAM) is 2-dimensional, 
but as it is based on simulations that relied on pre-launch 
measurements of the mirrors, it suffers from a number of 
deficiencies. The Extended Accuracy Model (EAM) is a 1- 
dimensional model based on in-orbit measurements of real 
sources, and is considered the most accurate but obviously 
does not encapsulate the complex 2-dimensional structure 
observed in the PSF at large off-axis angles. Currently in 
XCS, we use the EAM when measuring source extents for 
both real sources and simulated ones used to create the sur- 



vey selection function (Section 3.2.31, and when carrying 
out spatial fits to cluster surface brightness profiles (Sec- 
tion 4.3.21, and we use the MAM when creating simulated 



sources for the selection function (Section 3.4.31. In the fu- 



ture we hope to include the new 2-d model under develop- 



ment by the XMM Science Operations Centre (Read et al 



2010). This improved model will more acturately encode the 



off-axis, azimuthal and energy dependencies of the PSF. 



2.2 Data Acquisition 

In Fig. [2] we illustrate how the non-overlapping area in the 
public XMM archive has grown over the past ten years, both 
in terms of total area and in terms of area suitable for the 
discovery of clusters, i.e. outside the Galaxy (|6| > 20°) and 
Magellanic Clouds (> 6° [3°] of the Large [Small] Magellanic 
Clouds). We note that these calculations take into account 
other, smaller, regions deemed by XCS to be unsuitable for 



serendipitous source detection (see Section 2.4.1|. By now 



there are over 600 deg of the sky covered by XMM, but of 
that, only ~50deg 2 , 280 deg 2 and 410 deg 2 , at > 40 ks, > 10 
ks and > ks depths respectively (exposure times are those 
after flare cleaning, Section 2.3.31, are in regions suitable 



for cluster searching. This area is distributed across the sky 
(Fig. |3| rather than as a contiguous region. 

As shown in Fig. [2] new data enters the archive almost 
every day, but due to practical constraints we have only 
processed the data in a small number of large batches, cor- 
responding to all the public EPIC data available at that par- 
ticular time. The downloads take advantage of the Archive 
Interoperability System (AIO: [Arviset et al.|2004[); thi s pro- 
tocol allows the XM M Science Archive (XSA: |Clavel||1998 



Arviset et al.|2002 \ to be searched in an automated fashion 
At the time of writing, the most recent download was com- 
pleted on 21st July 2010, corresponding to 5,776 separate 
XMM observations. Their locations are shown in Fig. [3] 
Each of these observations (including those broken down 
into multiple exposures) has a unique identification num- 
ber, or ObsID. In the following, we use the term ObsID to 
refer to the set of Observation Data Files (ODF) that con- 
tains all the observation-specific data. We note that, even 
with appropriate compression etc., the XCS archive, of raw 
and processed data products, amounts to on the order of 4 
terabytes. 
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75° 




Figure 3. The distribution on the sky of the 5,776 ObsIDs in the XMM archive as of 21st July 2010. Locations in green [blue] are inside 
[outside] the proposed footprint of the Dark Energy Survey (darkcnergysurvey.org). The Galactic plane and locations of the Magellanic 
Clouds arc highlighted by the red dashed line (we do not carry out cluster searches within those regions). 



2.3.1 Calibration 
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Figure 2. Cumulative sky area covered by public data in the 
XMM archive as a function of exposure time, for the whole sky 
(solid) and excluding the Galactic plane and Magellanic Clouds 
(dashed) and for a variety of different exposure time cuts, at the 
time of the most recent XCS download in July 2010. The flat- 
tening of the curves mid-way through 2009 reflects the fact that 
proprietary observations only become public a year after they 
are completed, so that most data taken after that time was still 
proprietary at the time of the download. 



The reduction and analysis of XMM data requires calibra- 
tion information detailing how the telescopes and instru- 
ments behave, e.g. the effective area of the XMM telescopes 
and the detection efficiency of the instruments (both being 
functions of photon energy and detector position), plus the 
instrumental uncertainty associated with measuring photon 
energies. The most up-to-date version of the XMM Current 
Calibration Files (CCF), as of 21st July 2010, were used for 
the analysis presented herein. 



2.3.2 Software Versions 

Several different software packages are deployed for XCS 
analysis: version 10.0.0 of the Science Analysis Software 
(SAS:|Gabriel et al.|2004| ); version 6.9 of HEASOFT ([Black- 
burn||1995[); versio n 4. 2~of CIAO ( [Doe et al.||2001| |Deponte 
Evans et alJ]2008[ ); and version 12.6.0i of XSPEC ( jArnaud 
1996). In order for these packages to be used in the au- 
tomated batch manner needed for XCS, several different 



wrapper programmes were written in scripting languages. 
For the work described in Sections [2] and [4] version 2.6.4 of 
Python (docs.python.org) was used to write these wrapper 
programmes, whereas version 7.1 of IDL (www.ittvis.com) 
was used for the work presented in Section [3] 



2.3 Data Reduction 

The data reduction was carried out in a fairly standard man- 
ner (see for instance section 3 of Read & Ponman 2003[ ). 
Only events with patterns (characterisations of how many 
CCD pixels are involved in an event) 0-4 were used for the 
EPIC-pn and 0-12 for the EPIC-mos. A schematic of the 
data reduction procedure is shown in Fig. |B1| 



2.3.3 Flare Cleaning 

One important aspect of our pipeline reduction was the 



treatment of background flares. It is well documented ( Lumb 



et al.|2002||Read fc Ponman|2003||Pradas fc Kerp|2005[ ) that 
XMM observations often suffer from periods of enhanced 
particle background, caused mostly by variations in solar 
activity in conjunction with the position of the spacecraft in 
its orbit. To increase the signal-to-noise of the data, we have 
designed an automated procedure to remove periods of high 
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background. This was achieved by creating a lightcurve, di- 
vided into 50-second bins. The bin size was chosen to balance 
a reasonable time resolution with minimising shot noise. The 
lightcurve was first generated, and cleaned, using the high- 
energy events (12-15 keV for the EPIC-pn and 10-12 keV 
for the EPIC-mos cameras), because these events are more 
likely to be from the particle background than from astro- 
nomical sources. The cleaning process is then repeated, using 
a soft-energy lightcurve (0.2-1.0 keV), to account for periods 
of elevated background coming from soft protons. 

The cleaning process for each energy band involved an 
iterative 3-a clipping procedure that selected which 50 s 
bins to exclude. The mean and standard deviation of the 
lightcurve were calculated and bins more than ±3-0" from the 
mean were removed. The 3-a limits were then re-calculated 
and the process repeated up to 50 times or until a stable 
state is reached, whereby the bins that are being excluded 
are not changing (note that previously excluded bins can be 
re-instated in subsequent iterations if the 3-a limits become 
larger). The maximum of 50 iterations was set to avoid cases 
where the stable solution oscillates between two or more sim- 
ilar states. 

We note that before the first 3-a clipping took place, 
an initial maximum-rate threshold is used to 'clip' the 
lightcurve. This threshold is the greater of either 50 counts 
per bin for the EPIC-pn (and half this for the EPIC-mos 
cameras) or 125 per cent of the highest value in the low- 
est 5 per cent of the bins. This initial filtering was found 
to improve the flare cleaning results when flares accounted 
for a large fraction of the total exposure time. A flowchart 
illustrating the flare cleaning steps is shown in Fig. |B2| 
Fig. [4] shows an example hard-band lightcurve before and 
after cleaning. 

The combination of the excluded bins for the hard and 
soft-background lightcurves is then used to define the good 
time intervals (GTI) used to filter the raw event files. Fig. [5] 
shows the distribution of ObsID exposure times before and 
after the process of flare cleaning. The filtered event files are 
used several times during XCS analysis. They are used to 



produce the images (Section 2.4 1, used for the initial XCS 
source detection (Sect ion |3.1 1, and then again to determine 
spectroscopic (Section |4.2||4.4[ ) and spatial parameters (Sec- 



tion 



4.3 1 for the cluster candidates. 



2.4 Image Production 

Starting with the cleaned event lists described above (Sec- 
tion 2.3.31, the individual camera exposures were spatially 



binned, with a pixel size of 4.35 arcsec, to generate images. 
This pixel size was chosen because it is smaller than the 
PSF, at all detector locations and photon energies. Images 
were produced in two bands, soft (0.5-2.0 keV) and hard (2- 
10 keV). Exposure maps were also created for each image. 
The exposure maps encode the impact of vignetting on the 
image sensitivity and also record the locations of chip gaps, 
bad rows, etc. 

The EPIC cameras do not have shutters, so events re- 
ceived while an observation is reading out, the so called out- 
of-time events, will be assigned incorrect positions and ener- 
gies. For XCS, only EPIC-pn images were corrected for out 
of time events, because the EPIC-mos cameras have a much 
lower readout rate and negligible out-of-time events. The 



EPIC-pn corrections were done in the standard way, i.e. the 
event file was recreated assuming all the events are out-of- 
time and assigning them new positions along the CCD col- 
umn at random. These are then used to create out-of-time 
images that can be subtracted off the true images (with the 
appropriate correction for the fraction of out-of-time events). 

The images and exposure maps for the individual cam- 
eras were merged to create a single image and exposure map 
per ObsID. For this, the pixel values in the EPIC-mos maps 
were scaled to that of the EPIC-pn camera using the previ- 



ously calculated ECFs (Section 2.51. Examples of XCS gen- 
erated exposure maps and images can be seen in Fig. |Al| and 
[6] A total of 5,642 image files have been generated from the 
5,776 XMM ObsIDs that make up the current XCS dataset 
(a small number of ObsIDs in the archive are not suitable 
for automated image generation for a variety of technical 
reasons such as telemetry and calibration issues, etc.). 



2.4.1 Image Masking 

The production of images is an automated process, however 
they do need to be checked by eye before passing them to the 
source detection pipeline (Section 



3.1 1. This is because we 



download all public data, regardless of the intended (by the 
PI) target. As a result, the XCS image archive includes Ob- 
sIDs with very extended targets (such as low-redshift clus- 
ters or Galactic supernova remnants) and ObsIDs with very 
bright targets (such as luminous AGN). The very extended 
targets will enhance the background level over the major- 
ity of the XMM field of view, and thus reduce our abil- 
ity to make serendipitous detections of sources. The very 
bright sources will generate artefacts in the images, such 
as radial spikes and out-of-time bleed trails; those artefacts 
could then be falsely identified as additional sources. The 
eye-balling process identifies ObsIDs that should be com- 
pletely excluded from the other stages of the XCS pipelines. 
It also allows us to mask out regions of ObsIDs that are 
only partially afflicted by bright/extended targets. Approxi- 
mately one-third of ObsIDs require some degree of masking, 
with the median area lost being around 4 per cent (though 
this can be as high as 80 per cent in extreme cases). The 
mask files are of the same dimensions as the image files and 
are used during the source detection and also when creating 
backgrounds for the spectral and spatial fitting. We show 
some examples of XCS images that require full or partial 
masking in Fig. [7] 



2.5 Energy Conversion Factors 

In order to be able to convert image source counts into en- 
ergy fluxes, energy conversion factors (ECFs) need to be cal- 
culated. These are necessarily model dependent and are af- 
fected not only by the source and instrument properties but 
also by the HI column, nn hereafter, along the line of sight. 
In our survey, the source properties are not known in ad- 
vance, so a generic model has to be assumed. Since the vast 
majority of the sources detected by XCS are point sources, 
and point sources are likely to have power-law spectra, the 
model used to calculate the conversion is an absorbed power 
law with a canonical AGN index of 1.7 ( jMushotzky et ah] 
1993 1. The photoelectric absorption is set to the appropriate 



© 2010 RAS, MNRAS 000, [Tp5] 



The XMM Cluster Survey: X-ray analysis methodology 7 







20000 40000 60000 80000 100000 120000 

Time (s) 



G000O 80000 

Time (s) 



100000 120000 



Figure 4. EPIC-pn example hard-band lightcurve with 50 s bins. Left panel: Raw events before cleaning. Right panel: Cleaned events 
with periods of high background removed. 
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Figure 5. The distribution of ObsID exposure times. Left panel: The number of ObsIDs before (green) and after (blue) the process 
of flare cleaning. Right panel: The number of ObsIDs in which extended XCS sources with 300 or more counts were detected (red), 
compared with all ObsIDs (after flare cleaning). 



7iH value for the field (Section 2.5.1 1. The ECFs were cal 



shift, however the metal abundance is kept fixed at Z = 0.3 x 



culated, using the XSPEC spectral fitting package and the 
on-axis spectral responses, for each camera exposure related 
to a particular ObsID. For the specified model, the ratio of 
the resulting flux and count-rate is stored as the ECF for 
that exposure. ECFs are not exposure time dependent, but 
due to variations in uh, the choice of optical blocking filter 
and the effective area of the instrument, ECFs in XCS still 
vary from exposure to exposure and from ObsID to ObsID. 
They generally range from 4.4 to 6.6 for the EPIC-pn and 
1.6 to 2.0 for the EPIC-mos cameras (in units of 10 -11 ct 
cm -2 erg -1 ). Even though the ECFs are calculated for the 
on-axis aim point, they can still be used for sources detected 
anywhere in the field of view, by correcting them using the 
exposure map. 

We also calculate, for each ObsID, a further set of con- 
versions using the MEKAL model ( |Mewe et aL|l986[ ). The 
MEKAL model is the standard model used to describe ther- 
mal and line emission from clusters of galaxies. The MEKAL 
conversions are done over a grid of nn, temperature and red- 



the Solar values in Anders & Grevesse (19891. (This choice 



of metallicity is standard in the field because previous work, 



such as by Maughan et al. (20081, has shown that abun- 



dances vary little from this value over a wide range of red- 
shifts.) The gridded MEKAL conversions can be used to 
convert count-rates to bolometric luminosities and vice versa 
(and we refer to these conversions as LCFs hereafter). The 
LCFs are used to calculate synthetic cluster count-rates for 



the survey selection function (Section 3.4.3 1 and to estimate 
luminosities for XCS candidates during the literature red- 



shift search (Section 4.1l. The LCFs, like the ECFs, are 



calculated for the on-axis aim point, but can be adjusted to 
another location using the exposure map. 



2.5.1 Galactic HI Column 

X-ray photons are absorbed by material along the line of 
sight, and in particular by helium and oxygen for photons 
above ~ 0.5 keV (Wilms et aT1|2000[ ). One can predict the 
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level of absorption if % is known, so, for XCS, we estimated 
the riH values for each source using the compilation of |Dickey| 
& Lockman ( 1990 1, which combines the Bell Labs HI Survey 
} Stark et al.|1992 \ data with other surveys for all sky cover- 
age. We use riH to calculate ECFs and LCFs (see above), 
but also at other points in the XCS pipeline, e.g. when 
analysing X-ray spectra (Section 4.2 1. We note that self- 



shielding of molecular hydrogen, from ambient ultra-violet 
radiation, can occur when nu > 5 x 10 20 cm -2 ( Arabadjis & 
Bregman|1999[ ). This molecular gas absorbs X-rays and thus 



distorts flux conversions that are based only on tih values. 
For this reason, XCS fluxes derived when nu > 5 x 10 20 
cm -2 should be regarded as lower estimates. 
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3 GENERATION OF THE XCS SOURCE 
CATALOGUE 

In this section, we provide details of our source detection 
algorithm, known as the XCS Automated Pipeline Algo- 
rithm or Xapa. In Section |3,1| we explain how Xapa ap- 



plies wavelets to the pipeline generated images (Section 2.4| 



to generate a source list per ObsID. In Section |3.2| we de- 
scribe the parameters that are measured by Xapa for each 
detected source. In Section [3.31 and [3.41 we demonstrate the 
quality of the Xapa data products for point and extended 
sources respectively. In Appendix [B] we provide related flow 
charts. 



3.1 Source Detection 

Xapa source detection is based upon the mission- 



independent source detection package WavDetect (Free- 
man et al.|2002 F02 hereafter), which is available as part of 
the CIAO software package. F02 have shown that WavDe- 
tect's wavelet-based algorithm is more sensitive than stan- 
dard sliding-cell algorithms (e.g. CellDetect from Ciao, 
Fruscione et al.|2006[ ) and is considerably faster than event- 
list-based algorithms such as CIAO's VTPdetect. Before 
deciding to use WavDetect as the basis for the Xapa algo- 
rithm, we also examined the XMM SAS Ewavelet program 



and the SExtractor package (Bertin & Arnouts 19961, 



finding them both to be inadequate for our purposes (see 
|Davidson|2006| for a discussion). 

The F02 version of WavDetect consists of two compo- 
nents, wtransf orm and wrecon. The former convolves binned 
images with Mexican Hat ( jSlezak et al.|1990 1 wavelet func- 
tions with various user-specified scale sizes and then iden- 
tifies pixels that are significantly above the background. In 
Xapa, we use the F02 version of wtransf orm as part of an 
automated pipeline known as md_detect[^] as illustrated by 
the flowchart of Fig. |B3[ We use a set of nine wavelet scales, 
numbered according to increasing size, and corresponding to 
V2, 2, 2V2, 4, 4\/2, 8, 8y/2, 16 and 32 image pixels. At 
each scale, the convolved image is compared with a thresh- 
old image. Convolved image pixels with values greater than 
their corresponding threshold image pixels are assumed to 
be associated with astronomical sources ('significant pixels' 
hereafter). For those pixels, we reject the null hypothesis 
that they are consistent with the measured background. We 
then generate a set of support images, which record the sig- 
nificant pixels at each wavelet scale. 

In order to enhance the detectability of faint extended 
emission, md_detect performs the wavelet analysis in two 
stages (or 'Runs'). In Run 1 (scales 1-2), bright compact 
sources are located first. These are then masked out before 
performing Run 2 (scales 3-9). The masking step was found 
to be necessary because bright point sources can pollute the 
wavelet signal on large scales, and hence mimic extended 
sources. Unfortunately, this masking can occasionally result 
in genuine extended sources being excluded from the candi- 
date list, so an extra step was added to Xapa to mitigate 
this effect (Section |3.1.1 1. 



The second component of the F02 version of WavDe- 
tect is wrecon. This generates a source list for each im- 
age, by grouping collections of significant pixels together 
into source regions, or 'cells'. A drawback of the F02 ver- 
sion of wrecon is that it uses the instrument PSF to define 
the size of the cells. This means that extended sources can 
be broken up into multiple contiguous 'sources' (because a 
single PSF-sized cell is not big enough to enclose all the 
flux). To overcome this problem, we wrote a modified ver- 
sion of wrecon, called md_recon, for XAPA. Unlike wrecon, 
mdjrecon does not assume a priori the size of the detected 
sources, and is consequently considerably better at fitting 
ellipses to extended sources. The operation of md_recon is 
as illustrated by the flowchart in Fig. |B4[ At each wavelet 
scale, md_recon first combines lists of significant source pix- 
els into source cells. Multi-scale objects, i.e. those detected 
by md_detect on multiple scales, are then filtered using a 



vision model' (Section 3.1.21. The vision model is a set of 



rules for combining the support images derived for different 
wavelet scales. The vision model is able to recognise when 
a point source is embedded in an extended source. It also 
fits elliptical regions to the recovered sources (the region en- 
closed by a source ellipse is referred to as e/ in the following 
descriptions). 



3.1.1 Extended Sources with Central Cusps 

The two step (Run 1, Run 2) procedure adopted by 
md_detect for source detection works well, in that it pre- 
vents bright point sources from contaminating the extended 
source list. However, it has the disadvantage that when a 
genuine extended source is detected in Run 1, it will be 
excluded from Run 2. This means that its size will be un- 
derestimated by the vision model, and it will not appear in 
an extended source list. Extended sources with cuspy bright- 
ness profiles will be particularly affected by this, e.g. clusters 
with cool cores. We have therefore devised a 'cuspiness test' 
that is carried out between Run 1 and Run 2. This involves 
generating a grid of 5 by 5 pixels, Q, centred on the position 
of each source detected in Run 1. A quantity, C, represent- 
ing the cuspiness of the central region is then calculated, as 
follows: 



C = 



Qn 



Qn 



(1) 



1 Where the md_ prefix acknowledges the architect of the routine, 
Michael Davidson. 



Tests showed that real point sources have C > 0.85, so if 
a Run 1 source is found to have C < 0.85 — i.e. it pos- 
sesses a flatter central profile than a real point source — it 
is removed from the list of Run 1 detections, resulting in it 
being available to be detected again in Run 2. This situation 
is illustrated by Fig. [8] 



3.1.2 The Xapa Vision Model 

Here we give more details about the vision model used to 
filter sources detected at multiple scales by md_recon. To 
describe our vision model we introduce the following two 
terms: a 'structure' is a connected set of pixels in the sup- 
port image for a particular scale; and an 'object' is a set of 
connected structures from different scales. The steps are: 

(i) For each structure, comprising a set of pixels {(x,y)} in 
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Figure 8. Illustration of the effect of extended source cuspiness. Left: The original (before the cuspiness test was introduced) Runl 
(blue) and Run2 (green) detections. Middle: The final source list if the cuspiness test is not performed. Right: The final source list (after 
the cuspiness test was introduced). Extended and point sources have green and red outlines respectively. 



Si which is the support image for scale i, determine whether 
the structure defines the 'root' of an object, i.e. whether 
S j ({(x,y)})=Q{oTallj <i. 

(ii) For each such root, check to see if there is a structure 
in the scale above at this position, i.e. if 3(x',y'), (x',y') £ 
{(x,y)},S i+1 (x',y')^Q. 

(iii) If such a structure exists, and its maximum pixel 
value lies within {(x,y)}, then these two structures are 
linked, such that the image pixels belonging to the object 
comprise the union of the pixels in the linked structures from 
scales i and i + 1. 

(iv) The process of upward linking continues until the 
condition in step (ii) is not satisfied, at which time the ob- 
ject is terminated. When each scale has been scanned for 
root structures and they have been propagated in the 'tree- 
like' fashion, then for each object created there exists a set 
of image pixels belonging to it. An ellipse can then be fitted 
to these regions and a source list created. 

This vision model can handle both point and extended 
sources. Crucially, it can also cope with point sources em- 
bedded in extended sources, and with close pairs of points 
which should be separated rather than blended. A schematic 
to illustrate how the vision model works when a point source 
is embedded in an extended source can be seen in Fig. [9] 



3.2 Source Properties 

Once md_recon has been run on a given image, the source 
list is passed on to the next part of the Xapa pipeline, 
f ind_srcprop. The two-stage operation of find_srcprop 
is illustrated by the flowchart in Fig. |B5| In the first 
stage, find_srcprop determines the significance of each 
detected source. In the second, a sub-routine known 
as f ind_srcprop_f inal, computes other source properties 
(such as the count-rate and probability of extent); it is the 
results from the f ind_srcprop_f inal that appear in the 



Table 1. Mask and aperture configurations for source and 
background flux determination used in f incLsrcprop and 
f ind_srcprop_f inal. 



XCS data tables (Section 3.2.71 



3.2.1 Measuring Source and Background Counts 

Here we describe how background corrected source counts 
were calculated in Xapa by f ind_srcprop and by its sub- 
routine f ind_srcprop_f inal. Tests during the development 
of Xapa showed that the best results were obtained using 



Type 



Configuration (f ind.srcprop ) 



Run 1 Mask: Run 1 sources masked at 2ef 

Flux: le f +Uniq(3e f ) 

Background: Inner radius at 2e t , min. area = 400 
pix 

Run 2 Mask: All sources masked at 3ef 

Flux: le f +Uniq(3e f ) 

Background: Inner radius at 3ef , min. area = 2000 
pix 

Type Configuration (f ind_srcprop_f inal) 

Point Mask: Point sources masked at 2ef 

Flux: le f 

Background: Inner radius at 2e t , min. area = 400 
pix 

Extended Mask: All sources masked at 3ef 

Flux: lef +Uniq(3ef), with internal point sources 
masked at lef 

Background: Inner radius at 3ef , min. area = 2000 
pix 



different aperture sets for each stage. The aperture set com- 
prises the region for source flux determination, the region for 
background flux determination and a masked region (which 
is not used for either). In Table [I] we note the configuration 
for both aperture sets. In specifying these, we denote by ef 
the ellipse as fitted to the object region, so that 3et is the 
ellipse with major and minor axes three times those fitted to 
the source by the vision model. We use Uniq(X) to denote, 
for a particular source, those pixels which lie only within re- 
gion X defined relative to that source: e.g. Uniq(3ef) defines, 
for a particular source, the set of pixels which lie within the 
3ef region for that source and for no other (as illustrated in 
Fig. [TO I. 



The expected background contribution is computed lo- 
cally. An elliptical annulus is placed around the source po- 
sition: the inner edge varies but is usually at 3et and the 
outer edge is increased until there are at least 2000 back- 
ground pixels, or no more area is available. The background 
count-rate, fe p i x , is then calculated as 6 p i x = B/E' x a', 
where B is the total number of counts in the annulus, E' 
the mean exposure in the annulus and a' is the number of 



© 2010 RAS, MNRAS 000, [Tp5] 




The XMM Cluster Survey: X-ray analysis methodology 

Scale 
5 

4 

3 



11 




Figure 9. Illustration of the 'tree' vision model. Left: The source configuration showing a point source embedded in a larger source. The 
dashed line indicates a 1-d cut through the sources. Right: A schematic of the significant pixels at each scale showing how the structures 
are connected to form objects. The vertical bars denote the position of the maximum coefficient at each scale. The maximum of scale 3 
lies outside of the structure of scale 1 hence a new object is started. 



pixels in the annulus. The expected number, B a , of back- 
ground counts within the source aperture is then computed 



Source C 



as B a 



x E x a, where E is the mean exposure in the 



source aperture and a the number of pixels in the aperture. 



3.2.2 Removing Low-Significance Sources 

The first task is to remove any sources which are statisti- 
cally of low significance, because they will not yield accu- 
rate properties. The source and background apertures used 
to determine this significance must be chosen carefully (Sec- 
tion 



3.2.11, but once the expected number of background 



counts, £? a , within the source aperture, ei, is known, it is pos- 
sible to assess the significance of the detected source. This 
is done by computing the probability that the background 
could, by chance, produce the detected number of counts 
in the source aperture, assuming a Poisson distribution for 
the background counts, with mean B a . Those sources with 
a probability higher than 0.000032 are removed from the 
source list: this probability is equivalent to a 4-a thresh- 
old for a Gaussian distribution. In addition, detections com- 
prised of only a single significant pixel are excised from the 
source list, regardless of their significance. These are likely 
to be hot pixels or sources that are too faint to be accurately 
parameterised. 



3.2.3 Measuring Source Extents 

After low-significance sources have been removed, the 
find_srcprop routine is run again on the sources above 
the > 4-a threshold, in order to classify them as point- 
like or extended. For this, we need to compare the sources 
to the instrument PSF. Unfortunately, no satisfactory 2-d 



PSF model for XMM exists (Section 2T.ll, so for XCS we 



adopted the best publicly-available 1-d (radially-averaged) 
model — the Extended Accuracy Model (EAM). This, in 
turn, necessitated the development of a source classification 
criterion based on a 1-d source property. For XCS, we used 
the Encircled Energy Fraction (EEF). The EEF records the 



Source A 




Source B 



Figure 10. A diagram showing how the aperture used to measure 
source flux is created. The source to be measured is Source A and 
there are also two other objects nearby (Source B and Source C). 
Both the 1 * e{ and 3 * £f ellipses are shown for each source (red 
and green respectively for Source A and dark blue and light blue 
for B and C. Hence, the area used to calculate the flux for Source 
A is the red plus the green region. 



fraction of the total energy of a source as a function of in- 
creasing (circular) aperture size. We note that even though 
the shape of the PSF changes considerably towards large 
off-axis angles, its radial average, the EEF profile, is only a 
weak function of off-axis angle ( Davidson|2006 1 , making it a 
good basis for a classification criterion to be applied across 
the full field of view. 

Our extent classification is based on testing the null hy- 
pothesis that the measured EEF for a source is consistent 
with the PSF EEF, at the appropriate off-axis angle. This 
is implemented using a Kolmogorov-Smirnov (K-S) test, us- 
ing the EEF profile of the source and a model-merged PSF 
EEF. The PSF EEF is derived from EAM EEFs produced 
by the SAS task CALVIEW from the Current Calibration 
Files (CCF) for each camera. This is weighted by the En- 



ergy Conversion Factor (ECF, Section 2.5| appropriate for 
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that ObsID. We adopt for P(point), the probability that the 
source is point-like, the maximum value of the probability 
returned by the K-S test run on a 3 x 3 pixel grid (with spac- 
ing ±0.5 pixels in x and y) around the source position (in 
Section |3.3.1| we show that the typical positional accuracy 
of XCS source centroids are good to better than 1 pixel). 

The reliability of the P(point) values is a function of 
several factors, including the position on the field of view, 
the background level, the number of source counts, and the 
proximity of neighbouring sources. For that reason, choos- 
ing a fixed threshold in P(point) for our classification would 
be inappropriate. Instead we are forced to conduct a series 
of Monte Carlo (MC) simulations for every source: this is 
computationally expensive, but it is vital to prevent misclas- 
sification. This simulation process involves generating 200 
realisations of the appropriate PSF EEF model and popu- 
lating them with the same number of counts as measured in 
the data. Each of the 200 realisations are compared to the 
model and an empirical distribution of the K-S d values is 
established. If none of the simulated distributions returns a 
d value as great as the measured value, we classify the source 
as being extended. With this procedure, the statistical prob- 
ability of misclassifying an isolated point source as extended 
is 0.005 or less. However, we note that this does not take 
into account systematics, such as when two or more point 
sources have been blended by Xapa into a single source pro- 
file. These can only be removed a posteriori, by eye-balling 
the extended sources that make it through to the cluster can- 
didate list. This eye-balling, or quality control (see Fig. [I]), 
process is described in more detail in Mil. 



3.2.4 Correcting Artefacts 

After the second pass of f ind_src_prop has been completed, 
we have a preliminary list of sources (classified as extended 
or point-like) for a given ObsID. Initial tests showed that 
these preliminary lists include a number of artefacts. These 
must be corrected for before inclusion in an XCS source 
catalogue (see below). The corrections are not foolproof, as 
not all genuine clusters make it through to the candidate list 
and not all contaminating sources are excluded, but because 
the corrections are folded into the survey selection function 
(Section |3.4.3[ ), they should not impact our ability to use 
XCS cluster catalogues for statistical studies. 

Xapa's mcLrecon algorithm successfully detects sources 
within sources (see Fig.|9|. However, one unintended conse- 
quence is the occasional multiple detection of a single source 
that has become split into two or more overlapping sources. 
This more often happens with extended sources, but can 
also occur with point sources at the edge of the field of 
view. Therefore, where there are incidences of two sources 
with overlapping cells, the sources are merged and source 



properties recalculated by fincLsrcprop (see Fig. 111. This 
refinement ensures that in most cases the source flux and 
morphology are recovered well. 

When a bright compact source lies in the outskirts of 
the field of view, it can produce a significant number of 
counts in the asymmetric outer regions of the PSF. We term 
these objects as 'point-sources-with-lobes'. The core of such 
sources are detected in Run 1 of mcLdetect, and hence the 
core counts will be masked from Run 2 (Section |3.1[ ), but 
the remaining outer counts might still yield a Run 2 de- 




Figure 11. An example where several initial detections of an 
extended source are subsequently merged by Xapa to improve 
the derived properties. 




Figure 12. Source ellipses defined by Xapa for a bright, off-axis, 
point source. Left: before the lobe removal step was included. 
Right: after the lobe removal step was included: note that the two 
point sources have still been recovered, but there is no erroneous 
large (extended) ellipse enclosing both of them. 



tection (see Fig. 121. Removing these point-sources-with- 
lobes, without also removing clusters with cuspy cores (Sec- 
tion 3.1.1 1, proved to be one of the most difficult problems 
to overcome with Xapa. After extensive tests, we arrived 
at the following compromise: an extended source is excised 
from the source list, as a suspected point-source-with-lobe, 
if it is both located within the 3et region of a Run 1 source, 
and has less than one fifth of the counts of that source. This 
removes the majority of the lobe artefacts, but can unfor- 
tunately also result in some genuine faint extended sources 
being excluded from the XCS cluster candidate list. 



3.2.5 Extended Source Flags 

When developing Xapa, we had to find a compromise be- 
tween contamination and completeness, i.e. between effec- 
tive and over cleaning of the extended source list. There- 
fore, rather than removing from the extended source list ev- 
ery object that could be erroneous, we have flagged certain 
sources that, conservatively, we view as suspicious. Our aim 



is to use the survey selection function (Section 3.4.3 1 to help 
us understand whether flagged sources should be included in 
statistical studies or not, but to date we have taken a conser- 
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vative approach and not included them in cluster candidate 
lists, or as targets for optical follow-up (Mil). The source 
flags are as follows: 

(i) Extended Sources that are PSF-sized. At large off-axis 
angles it is not infrequent for the flaws in the PSF model 
to cause an obvious, bright, point source to be classified as 
extended. Therefore, any source that is only just extended 
(i.e. that has a size very close to the PSF at the respective 
off-axis angle) is flagged as being 'PSF-sized' by Xapa. 

(ii) Extended Sources with Internal Point Sources. Even 
with the inclusion of the point-source-with-lobe test (Sec- 
tion 3.2.41, the Xapa vision model (Section 3.1.21 will occa- 



sionally misclassify flux from the outskirts of a point source 
(or flux from a collection of neighbouring point sources) as 
an erroneous extended source. We can mitigate this by flag- 
ging up likely incidences. Therefore, any extended source 
region that encloses one or more point sources that con- 
tribute > 1.3 times the extended source flux is flagged as 
being 'point contaminated' by Xapa. 

(iii) Extended Sources with Internal Runl Sources. The 
final flag is similar to the 'point contaminated' case, but cov- 
ers the incidences of genuine point sources, detected in Run 
1 by mcLdetect, being erroneously passed on to Run 2 by 
the cuspiness test (Section 3.1.11. Therefore, any extended 



source region that encloses one or more Run 1 detection re- 
gions that contribute at least half the extended source flux 
is flagged as being 'Run 1 contaminated' by Xapa. 



3.2.6 Source Parameters 

Once the source list per ObsID has been cleaned of artefacts, 
a file is generated that saves all the relevant data. This file 
is then interrogated when the survey-wide database is being 



generated (Section 3.2.71. The following attributes are saved 
per source: 

(i) The centroid location in image coordinates; 

(ii) The centroid location in sky coordinates (J2000); 

(iii) The centroid location in radial coordinates, i.e. the 
off-axis angle (arcminutes) and the azimuthal angle (de- 
grees); 

(iv) The major axis, minor axis and orientation of the 
source ellipse; 

(v) The average exposure time at the source location (sec- 
onds) ; 

(vi) The 0.5-2.0 and 2-10 keV background-subtracted 
source counts (in the merged image and in the individual 
camera exposures); 

(vii) The 0.5-2.0 and 2-10 keV background-subtracted 
count-rates and 1-<t count-rate uncertainties (in the merged 
image and in the individual camera exposures); 

(viii) The source significance and extent probability; 

(ix) The value of the source flags (see Section [3. 2. 5\ . 

3.2.7 Master Detection List 

Xapa produces a source list for each of the input ObslDs, 
then these lists are concatenated to form a Master Detection 
List (MDL). Present in the XMM archive are many areas 
that have been observed multiple times. As a result, some 
sources will have been detected by Xapa multiple times. 



When duplicates are found, only the detection with the 
most soft-band counts is passed to the MDL. To remove 
duplicates, it is necessary to set an appropriate matching 
radius. The positional accuracy of the survey is higher for 
point sources than for extended sources, so it makes sense to 
use a different radius for each type. The accuracy for point 
sources varies as a function of off-axis and azimuthal angles 
(amongst other parameters). However, for simplicity we use 
a single value for the radius of 5 arcsec. The case for ex- 
tended sources is less straightforward because of the variety 
of source types and morphologies. The positional accuracy 
for large diffuse objects, such as low-redshift clusters, can 
be very poor, making it hard to pick an appropriate radius. 
Fortunately, the largest diffuse sources should have already 
been masked from their host ObsID. So, for XCS, we use 
a fixed matching radius of 30 arcsec for extended sources. 
This radius is large enough to allow reliable source match- 
ing, but small enough to minimise removal of genuine cluster 
candidates. 

As of May 1st 2011, Xapa had run on 4,029 ObslDs, 
resulting in 114,711 point sources and 12,582 extended 
sources being included in the MDL. Of the 12,582 extended 



sources, roughly half were flagged (§ 3.2.51 and these were 
removed from the list of potential cluster candidates (leav- 
ing 6,983 sources). Additional cuts to this list included the 
removal of sources within 20° of the Galactic plane and 6° 
[3°] of the Large [Small] Magellanic Cloud. Those cuts were 
made because it can be hard to carry out effective optical 
follow-up in regions of high projected stellar density. More- 
over, the closer one gets to the Galactic plane, the higher 
the hydrogen column (large riH values impact our ability to 
recover accurate source fluxes). Further cuts, see below, are 
then imposed to ensure that the vast majority of XCS clus- 
ter candidates are genuinely serendipitous detections, rather 
than the intended target of the ObsID. A final cut, on min- 
imum source count (> 50) is then applied, leaving 3,675 
sources drawn from 1,533 different ObslDs; when we use the 
term 'candidate' hereafter, we are referring to these 3,675 
sources. The candidates have a range of counts, from 50 to 
several thousand. Of particular interest to the cosmology 
and evolution studies we plan with XCS are the 993 with 
more than 300 counts, because these should deliver, once 
redshift information is available, reliable temperature esti- 



mates (Fig. 171 



As mentioned above, filters were applied to exclude non- 
serendipitous or 'target' objects from the candidate list. The 
targets in question are primarily clusters, but other types of 
extended X-ray sources should also be excluded (e.g. low 
redshift galaxies). It is also important to identify extended 
sources that are physically associated with the target, e.g. 
if they both belong to the same supercluster. The target 
filters involved both checks of the ObsID file headers and 
automated queries to the NASA Extragalactic Database 
(NEDQ The filters were run separately on each ObsID that 
a particular extended source was detected in. A given ex- 
tended source (that passed the other cuts described above) 
was included in the candidate list if it was classed as being a 
serendipitous detection in at least one of those ObsID (even 
if it was classed as being the target of one or more others). 
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We acknowledge that some XMM targets, and some sources 
associated with targets, do make it through into our can- 
didate list. However, as shown in Mil, these are straight 
forward to remove at the quality control stage (eight such 
examples were removed from XCS-DR1). 

Extended sources were excluded from the candidate list 
if they met one or more of the following criteria: 

(i) Their Xapa centroid fell within 2 arcmin of the aim 
point of an ObsID with an object classification (as listed in 
the header) of 'cluster' or 'group'. 

(ii) Their Xapa ellipse overlaps the aim point of an ObsID 
with a target name (as listed in the header) that has been 
associated with a cluster or group in NED. (This filter is 
necessary either when the pointing type is not included in 
the header, or is incorrect.) 

(iii) Their Xapa ellipse overlaps the centroid of a cluster 
or group in NED, when the aim point of the ObsID falls 
within 2 arcmin of that cluster or group. (This filter is nec- 
essary because sometimes non standard target names are 
listed in the header.) 

(iv) Their redshift is within 5000 km s _1 of the redshift 
of an object in NED, when the target name (as listed in the 
header) has been associated with that object. Both redshifts 
are automatically extracted from NED. (This filter is neces- 
sary because some ObsID targets are deliberately positioned 
off axis. This filter also reduces the number of sources en- 
tering the candidate list that are physically associated with 
targets (including with non-cluster targets, such as AGN).) 

(v) Their Xapa ellipse overlaps the aim point of an ObsID 
with a target name (as listed in the header) that has been 
associated with a known galaxy in NED. (This filter was 
found to be the most effective way to exclude low redshift 
galaxies from the candidate list.) 



3.3 Xapa Verification: Point Sources 



As mentioned above (Section 3.2.71, Xapa has catalogued to 
date in excess of 100,000 unique point sources. In this section 
we test Xapa astrometry and flux measurements using these 
point sources, finding both measures to be robust. 



3.3.1 Positions 

To determine the positional accuracy of the XCS point 
sources, it is desirable to use a catalogue that has high spa- 
tial resolution and astrometric precision. It would also need 
to have significant overlap with the XMM archive. A natu- 
ral choice for this is the Sloan Digital Sky Survey (SDSS^J 
Abazajian et al.||2009 1 ; the data is of high quality and con- 
tains many objects that would be expected to have X-ray 
counterparts, e.g. quasars and AGN. A cross match of XCS 



point sources against the SDSS Quasar Catalog IV (Schnei- 



der et al. 20071 using a radius of 10 arcsec produces 1131 



matches. This was extended further using the catalogue of 



Veron-Cetty & Veron (2006 VeronCat hereafter). Veron- 
Cat is a compilation of all known AGNs and QSOs (includ- 
ing those in the SDSS). A 10 arcsec matching radius re- 
turns 2807 matches, the distribution of which can be seen in 



http:/ /www. sdss.org 



Fig. |13| We have determined the chance of false association 
between the XCS and VeronCat with a 10 arcsec matching 
radius to be 1%. The mean matching distance is 2.6 arcsec, 
and 95% of the matches fall within 6.6 arcsec. This level 
of precision is consistent with previous determinations of 



the positional accuracy obtainable with XMM data (Wat- 
son et al.|2009 l. 



3.3.2 Fluxes 

To assess the accuracy of the point source fluxes measured 
by Xapa we have compared the XCS point source list to 



the XMM Serendipitous Survey 2XMM catalogue (Watson 
et al.|2009 1 . This catalogue is the ideal counterpart to XCS 
because it is also based on automated pipeline analysis of the 
entire XMM archive. A 10 arcsec matching radius has been 
used to compare the samples. Fig. |A2| shows the flux com- 
parison from the individual cameras aboard XMM, using a 
0.5 — 2.0 KeV band. There is clear consistency between the 
two surveys, with no significant systematic offsets. It is im- 
portant to note that the default Xapa fluxes for extended 
sources are not similarly reliable. This is for two reasons, 
first the ECFs used to generate the fluxes relate to power- 
law spectra (whereas extended sources are more likely to 
have thermal spectra) and second, the fluxes have not been 
properly corrected for any source flux lying outside the Xapa 
defined ellipse. In Section |4.3| we describe how aperture cor- 
rected energy fluxes are determined for the candidates. 



3.4 Xapa Verification: Extended Sources 



As mentioned above (Section 3.2.71, Xapa has catalogued to 



date in excess of 10,000 unique sources that have been statis- 
tically classified as extended. Xapa is not infallible however, 
and some of the objects in the candidates list will be erro- 
neous - because they are blends of point sources or other 
artefacts of the data reduction - and a small fraction will 
be other types of genuinely extended X-ray sources (such as 
nearby galaxies or supernova remnants). Nevertheless most 
of them will be clusters. In this section we first compare the 
Xapa determined extents for the clusters in the XCS-DR1 
sample to those in the 2XMM catalogue (Section |3.4.1[ ). We 
then compare the candidate list to the cluster sample of the 
XMM-LSS survey in the same ObsIDs (Section [3l~2"|) . We 



then describe how we quantify the completeness level us- 



ing simulations of our selection function (Section 3.4.31. We 
note that it is harder to quantify the contamination (due to 
blends and artefacts) level than the completeness level. In 
XCS we do not use simulations for this, but rather examine 
each source (and its optical counterpart) by eye (Mil). 



3.4.1 Comparison with 2XMM 

To investigate the quality of the Xapa determined source ex- 
tent, we have used the 394 XCS-DR1 clusters with matches 
to extended sources in the 2XMM catalogue. For these clus- 
ters, we have compared the Xapa major axis to the 2XMM 
extent measure. In the latter case, the quoted value is equiv- 
alent to the core radius of a /3-profile (Eqn. j2j|, so is al- 
ways smaller than the Xapa value, typically by a factor of 
5. The two measures are correlated (correlation coefficient 
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Figure 13. The relative position of the matches between XCS and VeronCat l |Veron-Cetty fc Veron|2006fl source positions. The dashed 
line represents the 95% matching radius. 



of 0.514), a with ~ 30 per cent scatter about the best fit 
relation.. Therefore, despite the very different methods by 
which extents are measured by the two surveys, both de- 
scriptions are useful when determining source sizes. We note 
that only 49(11 per cent) of the XCS-DR1 clusters in Ob- 
sIDs processed by 2XMM were not classified as extended 
sources in the 2XMM catalogue. Of course there might well 
be other clusters that 2XMM has detected as extended, but 
that Xapa has not. 



3.4.2 Comparison with the XMM-LSS 

The XM M Large Scale Struct ure ( XMM-LSS) Su r vey is re- 
ported in iPierre et all (120061) and iPacaud et all (120071). It 



covers a single contiguous region of roughly 6 deg^, com- 
prised of 51 ObsIDs, in which the authors have undertaken 
a dedicated cluster survey, accompanied by a detailed se- 
lection function. In this region they detected 33 'Class 1' 
extended objects. This class is designed to be uncontami- 
nated by mis-classified point sources. A more detailed ex- 
amination of these objects (including optical overlays, pho- 
tometry and spectroscopy) has confirmed 28 of these to be 
genuine clusters; the remaining 5 were shown to be nearby 
X-ray emitting galaxies. Twenty-nine of the 33 Class 1 ob- 
jects have counterparts in XCS that were classified as ex- 
tended by Xapa. This includes 2 of the non-cluster objects. 
Three of the remaining four Class 1 objects were detected 
by Xapa, but classified as point-like. The final object (XLSS 
J022210. 7-024048) was detected by Xapa, but subsequently 
removed from the source list because it did not meet our 4-cr 
significance requirement. 

The radius used in the matching of Xapa sources to 
the XMM-LSS was typically 10 arcsec. However for XLSS 
J022433. 8-041405 a radius of 24 arcsec was required to get a 
match; this source is large and elliptical, hence there is some 
uncertainty in the source centre, though the extent of the 
XCS source and its XMM-LSS counterpart are overlapping. 

The XMM-LSS also have a C2 class of clusters with 



slightly less conservative selection criteria. This sample has 
yet to be published, but the authors report this class to 
contain ~ 60 sources. Within the XMM-LSS ObsIDs, XCS 



detects 82 extended objects without flags (Section 3.2.5 I, so 
the overlap is likely to be substantial. 



3.4.3 Selection Function: Method 



Pioneering work by |Adami et al.| ( |2000[ ) , and later by |Bu 
renin et al. (20071, demonstrated the impact of complex se- 



lection effects on cluster samples derived from X-ray sur- 
veys. |Pacaud et al.| ( |2007[ ) have shown, using the XMM- 
LSS Class 1 sample described above, that the measured 
evolution in the normalisation of the Lx — Tx relation is 
significantly affected by selection biases. In another X-ray 



study, Mantz et al. (2010) provide an in-depth discussion of 



Malmquist and Eddington biases and their effect on mea- 
surements of scaling relations. Optical and SZ cluster sur- 
veys are also increasingly supported by selection function 
simulations ( |Melin et al.|2005| [Koester et ai]|2007[ ). 

The ability to measure selection functions for XCS was 
embedded at the outset in Xapa. Indeed, one of the driv- 
ing reasons behind us designing our own source detection 
pipeline, rather than using the excellent data products avail- 



able from the XMM Survey Science Centre ( |Watson et al. 
2009 1 , was the requirement that we needed to be able to 



quantify the extended source selection function using syn- 
thetic clusters. In the following, we describe how the selec- 
tion functions are carried out and present some results. 

Our approach follows a general method in which syn- 
thetic cluster profiles are added to EPIC merged images, 
which are then run through Xapa. The angular size of the 
synthetic cluster profile is determined from the angular di- 
ameter distance at the chosen input redshift. The profile is 
then randomly positioned into a blank XMM 'image', with 
a uniform probability across the field of view, and then con- 
volved with the appropriate PSF model. For this purpose we 
use the two-dimensional Medium Accuracy Model (MAM, 
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Section 2.1.11. This is a natural choice of PSF model for 
the selection functions because it accounts for the azimuthal 
variation in the PSF, and also because the alternative model 
(EAM, Section 2.1.11 is implemented in Xapa for source 



classification (Section 3.2.31: to keep the simulations fair, 
we cannot use the same model for blurring as we do for ex- 
tent classification. The convolution with the PSF creates a 
probability density function (PDF) for the synthetic cluster 
profile. We note that the shape of the synthetic cluster pro- 
file depends on the user's specific requirements, and we will 
discuss some examples in Section [3.4.4| fc [3.4.5| 

Next, an ObsID is chosen for the synthetic cluster to be 
placed in. The choice of ObsID will depend on the particular 
test being undertaken. For example, one might want to know 
the detection sensitivity in a particular ObsID, or one might 
want to know the detection sensitivity for a set of ObsIDs, 
e.g. those with similar nn or exposure times. The synthetic 
cluster is added to the chosen ObsID as follows: 

(i) The absorbed count-rate of the cluster profile is deter- 
mined from the gridded LCFs (Section |2.5| l for that ObsID, 
so that it matches the synthetic cluster's luminosity, tem- 
perature and redshift. 

(ii) The cluster PDF is normalised to the LCF predicted 
count-rate, thus creating a count-rate image. 

(iii) The count-rate- image is converted into a count-image 
by multiplying by the appropriate exposure map. 

(iv) The synthetic count-images for the individual cam- 
eras are then added to the respective real images (Sec- 
tion [S3. 

(v) The individual images are then added to make a 
merged image. 

The resulting merged image, containing the synthetic 
cluster, is then processed by Xapa in the standard way. 
There are two criteria that must be met in order for an input 
synthetic cluster to be deemed successfully 'recovered' by 
Xapa: the detection software must identify a source at the 
synthetic cluster location, and that source must be classified 
as extended. This has to be a new source; if the synthetic 
cluster happens to have been placed at random close to a 
previously detected real extended source, then the synthetic 
cluster is not classed as having being recovered (even if its 
'counts' dominate those from the real source). Depending 
on the application, we might further require that the new 



detection not be flagged (see Section 3.2.51. It is not suf- 
ficient to perform the synthetic cluster recovery test only 
once, rather one must perform it multiple times to ensure 
an accurate measurement of the recovery probability for a 
given set of input parameters. There is so much parameter 
space to be tested (see below) that the number of selection 
function tests can run into the millions for certain applica- 
tions. Determining the survey selection function is by far the 
most computationally demanding part of XCS. 

3.4-4 Selection Function: Results (Analytical Models) 

The simplest profile type that we have studied is that of 
an isothermal /3-profile cluster (Eqn. Using this profile 
we have tested the selection function dependency on clus- 
ter parameters (e.g. redshift, temperature, luminosity, core 
radius, profile slope and ellipticity) ; on image parameters 
(e.g. exposure time, off-axis angle, azimuthal angle); and on 
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Figure 14. Predicted recovery efficiency for 3 keV clusters as 
a function of core radius and recovered counts. The synthetic 
clusters used for this test had circularly symmetric /3-profiles (0 = 
2/3). 



cosmological parameters (e.g. k and f2 m , the curvature and 
present mean mass density of the Universe respectively). 
Some results from the /3-profile selection function runs have 
already been published ( |Sahlen et aL|2009[ ). In Fig. |A3} |A4 
andfTH we show some additional results. 

Figures [A3| and [A4| show how the selection function de- 
pends on cluster luminosity and redshift. We show results 
for 3 keV and 6 keV clusters with a range of luminosities. 
In all cases, the input profiles were spherically symmetric, 
with r c — 160 kpc and f3 = 2/3. The profiles were placed 
randomly in a subset of ObsIDs chosen to be a represen- 
tative sample of the whole archive, i.e. to have a similar 
distribution of exposure times and Galactic latitudes. This 
simple test confirms that bright clusters can be consistently 
detected out to redshifts of at least 1, whilst fainter clusters 
can only be found with reasonable certainty at lower red- 
shifts. We note that typical Lx values at 3 keV and 6 keV 
are roughly 1 and lOx 10 44 erg s _1 respectively, based on the 
low-redshift Lx — Tx relation of Arnaud & Evrard (19991. 



Therefore, we can expect to detect roughly 60% and 85% of 
3 and 6 keV clusters respectively at z — 0.6, but only 10% 
and 75% at z = 0.9 (assuming no evolution in the Lx — Tx 
relation) . 

Fig. [14] shows how the selection function depends on 
cluster angular size. Here we have run a set of 3 keV clusters 
through the selection function process with physical core 
radii varying according to the findings of | Jones fc Forman| 
(19841, i.e. in the range of 50 kpc to 400 kpc. Over the 



range of redshifts probed by XCS (0.1 < z < 1.0), these 
core radii have an angular size in the range 217 to 7 arcsec. 
Fig. [14] shows the fraction of clusters recovered by Xapa 
as a function of both angular size and the number of input 
synthetic cluster source counts. For clusters with more than 
300 counts, the cluster recovery rate is good (>70%) when 
the extent is in the range ~ 10 — 20 arcsec. These limits 
roughly translate to 0.1 < z < 0.6 for r c — 50 kpc and 
z > 0.3 for (more typically) r c = 160 kpc. 
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3.4.5 Selection Function: Results (Numerical Models) 

We have also investigated the recovery of clusters with pro- 
files drawn from cosmological hydrodynamic simulations. 
For this purpose we have used clusters from the CLEF 
(CLuster Evolution and Formation in Supercomputer Simu- 
lations with Hydro-dynamics) simulation ( |Kay et al.|2007 1. 
The use of the CLEF clusters has enabled an investigation 
into the effects of more realistic (than /3-profiles) cluster 
shapes on the XCS selection function, because CLEF in- 
cludes clusters with cool cores and substructure. 

The process by which the CLEF profiles are input into 
the XMM images is the same as for the analytical models. 
For simplicity, we use the CLEF catalogued mean cluster 
temperature, rather than the full temperature map, when 
calculating the total count-rates using the gridded LCFs 



(Section 2.5 1. These count-rates are then distributed using 
the emission measure profile (see |Onuora et al.|2003| for de- 
tails) as a probability map. The emission measure maps fully 
encode variations in temperature and density and so this ap- 
proach will preserve any substructure in the surface bright- 
ness and also the presence of any central peak caused by a 
cool core. 

The selection function work using CLEF has shown that 
strong peaks in the surface brightness profile, either due to 
substructure, or to a cool core, make clusters easier to detect 
than /3-profiles. However, these clusters do not always make 
it into the 'recovered list', because they are either misclassi- 
fied as point sources or flagged as being PSF-sized extended 



sources (Section 3.2.51. This misclassification trend is miti- 
gated if the total number of detected counts is large enough 
to sample more of the extended profile, and is almost com- 



pletely resolved at the 500-counts-per-source level (Hosmer 
|2010| ). 

The CLEF investigation has further shown that sym- 
metrical /3-profiles are an acceptable approximation to the 
XCS selection function. This is important because CLEF, 
and most other hydro-dynamical samples, are only avail- 
able for a single assumed underlying cosmology. In order 
to use XCS to measure the underlying cosmology, we need 
to know the selection function across a broad range of cos- 
mological parameters. The suitability of the /3-profiles was 
demonstrated by comparing the results of two duplicate se- 
lection function runs. The first used the CLEF cluster pro- 
files, each input multiple times to determine recovery effi- 
ciencies. The second run replaced the CLEF clusters with 
isothermal /3-profiles (r c = 160kpc, /3 = 2/3), whilst keep- 
ing all other aspects the same (ObslD, location, luminosity, 
temperature, redshift and input cosmology). The results, af- 
ter a 500-count detection limit has been imposed, are shown 
in Fig. |15| Plotted in red is the average recovery efficiency 
obtained using the CLEF cluster sample, and over-plotted 
is the data from using the /3-profiles (dotted- black line). 



4 ANALYSIS OF XCS CLUSTER 
CANDIDATES 

In this section we describe a further set of XCS analysis 
pipelines. These re-examine the XMM observations of the 
candidates delivered by Xapa. The first pipeline examines 
each of the candidates in turn, jointly interrogating the re- 




Figure 15. Predicted recovery efficiency of CLEF (- Kay et al.| 
|2007[ ) and /3-profile clusters as a function of redshift. A 500-count 
cut has been imposed, where the counts are as measured by the 
Xapa pipeline. The /3-profile clusters are paired with a CLEF 
counterpart, in that they have the same redshift, temperature, 
luminosity and location in the respective ObslD. 



spective ObslD and NED in the search for published red- 
shifts (Section |4.1[ ). The second pipeline carries out batch X- 
ray spectroscopy on candidates with redshift measurements 
and delivers measurements of the X-ray temperature (Sec- 
tion 



4.21. The third outputs total luminosities, by fitting 



spatial profiles to those candidates with Tx measurements 



(Section 4.3 1. The fourth pipeline has been designed to es- 



timate redshifts directly from the X-ray data (Section 4.4 1 



The methodology of all of these pipelines is described be- 
low, together with a range of verification tests. Here, all 
quantities are calculated assuming a standard concordance 
cosmology (H = 70 km s" 1 Mpc -1 , Q ro = 0.27 and f2 A = 
0.73) and all quoted luminosities are bolometric and within 
radii where the over density is 500 relative to the critical 
density (R 5 oo)- 



4.1 Automated NED Queries: Literature 
Redshifts 

Redshift measurements are essential if we are to use the can- 
didates for science applications. However, with 3,675 candi- 
dates in our current catalogue (May 1st 2011), we need to 
find automated ways to derive as much redshift informa- 
tion as possible. In order to automatically identify redshifts 
that are already in the literature, we have constructed an 
algorithm that searches the NASA Extragalactic Database 
(NED). These 'literature redshifts', or Znt, are only avail- 
able for a small fraction of the candidates, but they are 
still extremely important, in that they allow us to check our 
other redshift estimation techniques, in particular the X-ray 
redshifts described below (Section |4.4[ ) and the red-sequence 
redshifts described in our companion catalogue paper (Mil). 

The NED search was carried out for all the candidates. 
An initial search extracts all sources, classified as either a 
galaxy or a cluster, within a 30 arcmin search radius of the 



© 2010 RAS, MNRAS 000,[T}fl5] 



18 E. J. Lloyd-Davies et al. 



candidate centroid. Then, for every extracted object with a 
catalogued redshift, we calculate a crude placeholder lumi- 
nosity, Lx, P h- The Lx, P h is derived using the gridded LCFs 
(Section 2.51 and the soft-band Xapa count-rate. From the 
ix, P h, we then estimate a corresponding placeholder tem- 
perature, Tx, p h, for the candidate using the Lx — Tx rela- 
tion of Arnaud & Evrard (19991. From the 2x, P h we can 
then estimate a placeholder 7?soo,ph value (-R500 is the ra- 
dius from the cluster centre that represents an over density 
of 500 times the critical density), using the prescription in 
Arnaud et al. (2005), and a corresponding redshift appropri- 
ate angular search radius, 6*500, ph- The velocity dispersion- 



temperature relation of Bird et al. ( 1995 I is used in a similar 



way to estimate a placeholder velocity dispersion <j„, p h for 
the candidate. 



Any NED objects that lie outside their respective 
#500, P h are discounted as a true match. If any lie inside, 
then those classified in NED as clusters are then considered 
as a potential match. Should there be only one such object, 
then that is chosen as the best match. If there is more than 
one, then the object with the smallest positional offset is 
chosen. If no objects classified in NED as clusters fall inside 
the search radius, but some galaxies do, we then look for 
groupings of galaxy redshifts within the (#500, ph, o-„, p h) vol- 
ume. If more than one grouping of galaxies is found, then 
the one with the smallest positional offset is chosen as the 
best match. When the query was run (May 1st 2011), a total 
of 493 candidates were associated with published redshifts 
for clusters (412) or galaxy groupings (81) in NED. 



The NED redshifts were then passed to the 'Redshift 
Followup (Archive)' stage of the XCS pipeline (Fig. [TJ and 
individually checked. In doing so, it was discovered that 
some matches are wrong, i.e. the XCS source is not as- 
sociated with the selected NED cluster. This is especially 
true at low NED redshifts (where the allowed matching ra- 
dius is large). We found that imposing a redshift limit of 
2 = 0.08 was effective at removing the erroneous matches, 
although this reduced the number of NED redshifts available 
to 345. Of these 345 candidates, 218 passed the quality con- 
trol stage and made it into the XCS-DR1 sample. That is 
not to say that the remainder are not clusters, but rather 
that they cannot be confirmed as being so using the cur- 
rently available optical and X-ray data (see Mil). Of these 
218 only 127 are listed in XCS-DR1 with the automatically 
selected NED redshift. This is because, in the other cases, 
alternative redshifts were available. The alternative redshifts 
mostly came from either our own observations or from our 
analysis of the SDSS archive, but in eight cases they came 
from the literature. In those eight cases, the NED redshift 
listed for the cluster had not been updated to reflect more 
recent optical follow-up. The tendency for NED to retain 
outdated redshift information first became apparent to us 
when we compared the NED redshift (2 = 1.2) for the high- 
est redshift XCS cluster (XMMXCS J2215.9-1738) to the 
value we published (2 = 1.46) based on 31 secure spectro- 

1.2 value had 



4.2 Spectral Fitting: X-ray Temperatures 

In this section we describe our pipeline to measure X-ray 
temperatures for candidates with a secure redshift measure- 
ment; the Tx-pipeline hereafter. In Section |4.2.1| we explain 
how spectra are extracted and corrected for background con- 
tamination. Next we describe how these spectra are fitted 
to X-ray models and how parameter uncertainties are cal- 



culated (Section 4.2.21. Both of these tasks are carried out 
in an automated fashion so, to assess their efficacy, we have 
carried out a series of tests. These tests are described in 
Section 14331 



4-2.1 Generating the Spectra 

Spectra are generated for every candidate with an associ- 
ated redshift measurement. The first step is to establish all 
the ObsIDs in which the candidate was observed and all 
the exposures within them. We need to do that to ensure 
we have the maximum number of source counts available to 
carry out the fit. In the simplest case, the candidate will 
have only been observed in the ObsID listed in the MDL 



scopic redshifts ( Hilton et al.| 2010 1 . The z 
been taken from ( jOlsen et al.|2008"f and was based on single 
(i) band photometry. 



(Section 3.2.71, and there will be only three sub-exposures 
to exploit (one each for EPIC-mosl, EPIC-mos2 and EPIC- 
pn). However, in other cases the candidate might be covered 
by multiple ObsIDs (only the one generating the most soft 
counts is listed in the MDL). Moreover, there can also be 
multiple exposures within an ObsID, especially if the ex- 
posure time is long and had to be broken up over several 
satellite orbits. Finally, in some cases, one or more of the 
cameras might have been turned off, so fewer than 3 expo- 
sures are available. 

When all the exposures have been gathered then the 
cleaned event lists, described earlier (Section 2.3.3 1, are used 
to generate spectra. Only photons in the 0.3 keV to 7.9 keV 
band are used for this (the telescope is poorly calibrated at 
softer energies and the spectra are background dominated at 
higher energies) . The regions used to extract the source spec- 
tra are the ellipsoidal regions, tf, that Xapa defined for the 
respective candidate, although, if other Xapa sources over- 
lap with any part of e/, then events from those pixels are not 
included when the spectra are produced. The redistribution 
matrices and area response files necessary for spectral anal- 
ysis are then created, using the XMM SAS package. These 
files are ObsID, camera and position dependent and so one 
needs multiple sets for each candidate. 

Every source spectrum generated needs an associated 
field spectrum for the purposes of background subtraction. 
The background subtraction in the Tx-pipeline was done 
using an in-field method, since XCS clusters do not gener- 
ally have large angular sizes. The background spectra were 
usually taken from a circular annulus around the source, al- 
though in the case of sources very near the edge of the field of 
view, an ellipse perpendicular to the off-axis direction, with 
a circular region centred on the cluster excluded, was used 
instead. The outer radius of the background annulus is 1.5 
times the Xapa defined major axis of the respective candi- 
date. The inner edge varies depending on the exposure, but 
is no less than 1.05 times the major axis. Any pixels within 
the background region that overlap with other Xapa sources 
are excluded from the background spectrum. The normali- 
sation of the background is performed within XSPEC and 
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reflects the ratio of the number of pixels in the source and 
background extraction regions. 

4-2.2 Spectral fitting 

The spectral fitting was carried out using XSPEC. The fit- 
ting was done using the maximum likelihood Cash statis- 
tic (Cash 1979). As mentioned above, there can be multiple 
spectra per candidate and these were usually all fitted simul- 
taneously. The only exception were very low-count spectra, 
i.e. those with either less than 10 soft-band counts in to- 
tal or those with less than 10% of the soft-band counts of 
the spectrum with the most counts. These spectra were ex- 
cluded from the simultaneous fit because it was found that 
they degraded the fits. 

In XSPEC the photons within each spectrum are 
grouped into bins before fitting. For the Tx-pipeline we var- 
ied the minimum number of counts per bin according to the 
total number of counts in the spectrum. That way, higher 
signal-to-noise spectra could be fitted to higher spectral res- 
olution (and vice versa). For spectra with fewer than 250 
counts, the minimum was set at one count per bin. For spec- 
tra with more than 850 counts, the minimum was five. In 
between those limits, the minimum was scaled between 1 
and 5 counts using a power-law with an index of 0.75. This 
particular scaling of the minimum number of counts per bin 
was chosen after carrying out spectral simulations. It was de- 
signed to minimise the bias in the derived parameters while 
also minimising the statistical uncertainties. 

Four different models are fitted to the data. All of 
the models include a photoelectric absorption component 



(WABS; Morrison fc McCammon| T983 ) to simulate the bh 
absorption and a hot plasma component (MEKAL; |Mewe| 



et al. 1986 I to simulate the X-ray emission from the ICM. 



The different models are: 

(i) WABS *MEKAL with the hyd rogen column, n H , 
frozen at the Dickey & Lockman ( 1990) value and the metal- 
licity, Z, frozen at the canonical, 0.3Zq, value. 

(ii) WABS*MEKAL but with n H and Z allowed to vary. 

(iii) WABS*(MEKAL+POWERLAW), as (ii), but in- 
cluding an extra power-law component to simulate a po- 
tential contaminating point source. 

(iv) WABS*(MEKAL+MEKAL), as (ii), but with two 
MEKAL components rather than one, in order to simulate 
the case where there is a significant cool core in the cluster. 

The best-fitting model of these four is usually used to 
derive the luminosity and temperature of the cluster, but 
if the best-fitting model does not give sensible parameters, 
then the next best model will be selected, and so on. The ac- 
cepted ranges are 0.3 keV < Tx < 17.0 keV and luminosity 
less than 5 x 10 46 erg s -1 . It is important to note, however, 
that these luminosity values are not aperture corrected and 
only relate to the luminosity originating from the e/ region 
defined by Xapa. In general, a cluster will be more extended 
than this ellipse, and so these aperture luminosities, Lx, ap , 
need to be corrected for missing flux using a spatial model. 
We describe how such models are fit to XCS candidates in 
Section gXT] 

The 68% uncertainty bounds on the best-fit Lx, ap and 
Tx values are provided as part of the standard XSPEC fit- 
ting process: the parameter in question is stepped from its 
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Figure 16. Fractional temperature uncertainty as a function of 
number of soft-band counts as a result of fitting simulated z = 
0.5 MEKAL spectra with different temperatures, going from cool 
to hot clusters. The red, orange, yellow, green and blue points 
represent spectra of 1.5, 2, 3, 5 & 8 keV respectively. 



best-fit value until the fit statistic increases by the amount 
required for the confidence region needed (at each step point 
the other free parameters are refit). This stepping is done in 
both the positive and negative directions to obtain a confi- 
dence region. 



4-2.3 Tx-Ptpdme Validation 

The spectral pipeline is fully automated and so it is impor- 
tant to check the reliability of the results it produces before 
using them for scientific studies (such as the measurement of 
cosmological parameters). We have performed these checks 
using both XSPEC simulations and actual data. The results 
of the first check are presented in Fig. |16| where tempera- 
ture uncertainty is plotted against the number of counts in 
the fitted spectrum. For this test we have simulated cluster 
spectra (all at z = 0.5), using the MEKAL model, with a 
range of temperatures (1.5 keV < Tx < 8keV). It can be 
seen that it is much easier to constrain the temperatures of 
cool systems (red) than it is for the hottest systems (blue). 
The constraints also become worse as the number of source 
counts decreases. It can be seen that below 300 counts the 
temperature uncertainties exceed 50 per cent for the 8 keV 
systems, though they are considerably smaller than that for 
lower temperature systems. This test has informed our de- 
cision as to what count limit we should impose on the can- 
didate list when defining a sample for cosmological tests. 
We have set this limit at 300 because we use Tx values 
as a mass proxy when measuring cosmological parameters 
( Sahlen et al.|2009 l. 



The results in Fig. 16 were based on model spectra (al- 
beit with actual XMM background contamination) and so 
should be seen as a best-case scenario: real clusters do not 
have a perfectly isothermal ICM, nor have zero contami- 
nation from point sources. Therefore we have carried out a 
related test using four real clusters (Table [cTj ), the results of 
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Figure 17. The XCS determined X-ray temperatures (and un- 
certainty) as a function of the number of counts in the fitted 
spectrum. Each colour represents a cluster that was detected with 
more than 5000 counts. For details of the four clusters used in this 
plot, see Table [Cl1 The respective exposures were then subdivided 
to generate lower count spectra. Note that the higher tempera- 
ture systems do not yield fits at the low-count end. The 1-ct error 
bars come from the XSPEC fitting software (see Section |4.2.2| for 
details). 



Figure 18. Comparison of XCS determined X-ray temperatures 
when the cluster is observed off-axis (y-axis) or on-axis (x-axis). 
For details of the eight clusters used in this plot, see Table |C2] 
The solid line shows the one-to-one relationship. The error bars 
arc 1-ct. Both x and y-errors come from XSPEC (see Section [4.2.2| 
for details). 



which can be seen in Fig. 17 Here the best-fit Tx value (and 
its 1-ct uncertainty from XSPEC) are plotted against the to- 
tal number of counts in the spectra. This was achieved by ar- 
tificially reducing the exposure time of the respective event 
file. We note that only one realisation of this proceedure 
was performed for each total number of soft-band counts 
and that the error bars are the standard XSPEC generated 
values. From Fig. [17] it is clear that there are no systematic 
biases in the derived values of Tx as the number of counts 
decreases. The error bars increase, with decreasing counts, 
in line with the expectation from Fig. |16| The fit failed to 
converge at low counts for the hotter clusters, but in general 
Fig. [T7] supports our decision to cut the candidate list at 300 
counts for cosmological studies. This test also demonstrates 
that it is still worth fitting candidates with fewer counts, 
since we can derive reliable Tx values in the galaxy groups 
regime down to 100 counts. 

We have carried out a test to see if the Tx-pipeline 
works at large off-axis angles, since candidates are located 
across the XMM field of view. There are not many clus- 
ters to choose from for this test, but we did identify eight 
systems that have been observed by XMM both on and off- 
axis. For this purpose we define off-axis [on-axis] to mean a 
source centroid more than 10 [less than 3] arcmin from the 
ObsID aim point. The standard XCS spectral reduction was 
undertaken and the results can be seen in Fig. |18| It can be 
seen that the fits to spectra taken off-axis, while in general 



having larger uncertainties due to having a lower signal-to- 
noise, are consistent with the corresponding on-axis results. 
We can therefore be confident that the pipeline produced 
XCS Tx values that are not biased in cases where the ob- 
jects are located on the outskirts of the field of view. 

The test in Fig. [18] shows that the XCS pipeline is in- 
ternally consistent, but it is also important to compare XCS 
parameters to those derived externally, i.e. by other authors, 
since they will use different approaches. In particular, most 
cluster spectral fitting is done on an object by object basis, 
with the background regions and the light curve cleaning 
being adjusted by hand. By contrast the XCS pipeline is 
completely automated because we do not have the resources 
to fit hundreds of candidates individually. We have therefore 
tested the quality of the results from our pipeline using pre- 
viously published results. We have constructed a sample of 
11 XCS clusters which have previously published tempera- 
tures measured with XMM ( Pacaud et al.|2007| |Gastaldello| 



et al. 2007 Hoeft et al. 20081. The results can be seen 



in Fig. |19| where the temperatures derived from the XCS 
pipeline are plotted against those measured by other au- 
thors. It can be seen that there does not appear to be any 
systematic offset and the XCS temperatures are consistent 
with the literature values. This final test demonstrates that 
the XCS Tk values are reliable and hence suitable for sci- 
ence applications without the need for a further 'hands on' 
analysis stage. 
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Figure 19. Comparison of XCS determined X-ray temperatures 
with values determined by other authors. For details of the eleven 
clusters used in this plot, see Table [C3| The solid line shows the 
one-to-one relationship. The error bars are 1-cr. The x-errors are 
as quoted in the literature. The y-errors come from XSPEC (Sec- 
tion [1^2] for details). 



4.3 Spatial Fitting: X-ray Luminosities 



As mentioned above (Section 4.2.2 1, the spectral pipeline 
produces both luminosity and temperature fits, but the lu- 
minosities, I/x,a P , are within an aperture and are not cor- 
rected for missing flux. In order to extrapolate the cluster 
emission to -R500, so that the total cluster luminosity can be 
calculated, it is necessary to measure the surface brightness 
profile. This is achieved in the XCS spatial pipeline, the 
Lx-pipeline hereafter, by fitting an analytical function to 
the cluster and then using this to extrapolate to Rsoo- We 
decided against using the alternative, non-parametric ap- 
proach that produces de-projected gas densities, e.g. |Cros-| 
ton et al. (20081, because it is complex and, importantly, 



does not allow us to extrapolate fluxes to 7?5oo- 



4.3.1 Spatial Models 

Surface brightness fits are performed for every candidate 
that passes the spectroscopic pipeline. The main function 
used to characterise the shape of the clusters was a sim- 
ple one-dimensional, spherically-symmetric, /3-profile model 
i Cavaliere fc Fusco-Femiano|1976 1: 

1 -3/3+i 



S(r) = 5(0) 



1 + 



(2) 



where r c is the core radius and /3 is the density index param- 
eter, which encodes the power-law decline. Three different 
types of /3-profile models were fitted to the data: 



(i) One with /3 frozen at the canonical value of 2/3. The 
free parameters are the normalisation and the R.A. and Dec. 
of the centroid. 

(ii) One as (i), but with /3 allowed to vary. 

(iii) One with an inner power-law cusp inside a certain 
parameterised radius (usually of the order of the core ra- 
dius). This gives us a crude description of clusters with cool 
cores or AGN contamination. The free parameters are as 
(ii), plus both an extra normalisation and an extra power 
law index. 

The same background regions were used for the sur- 
face brightness fitting as were used for the spectral fitting. 
However, in addition to knowing the total number of counts 
in the background region, it is also necessary to know how 
those counts are distributed. The total XMM background 
varies considerably across the field of view and so for ex- 
tended sources, such as clusters, one cannot assume that 
the background counts are divided equally between all the 
pixels. The background can be considered as having two 
components, an 'X-ray component' that is focused (and so 
vignetted) by the telescope mirrors and a 'particle compo- 
nent' that is not. In reality, these terms are not particularly 
accurate, since the X-ray component includes soft protons 
that are focused by the mirrors and the particle component 
includes high-energy photons that are created as the result 
of particle collisions with the telescope structure.) 

The X-ray and particle components need to be treated 
separately during spatial fitting because their spatial varia- 
tion is different: the X-ray component is assumed to vary in 
the same way as the exposure map, because it is vignetted, 
whereas the particle component should show no positional 
dependence. To determine the particle background count- 
rate per pixel, we have selected, from the respective ObsID, 
two or more source regions that are at significantly different 
off-axis angles. We then compare the ratio of the normalised 
counts within those regions to the ratio of the same regions 
in the exposure map. The difference between those ratios 
tells us at what level the counts are contaminated by the 
particle background. This process is illustrated in Fig. |B8| 



4.3.2 Spatial Fitting 

As with the spectroscopic pipeline, the fit is performed si- 
multaneously on all ObsIDs, and sub-exposures, in which 
the candidate was observed (barring those with very few 
counts), see Section 



4.2.1 



For the spatial fitting we gen- 
erated new, 4.35 arcsec pixel, image files in the same 0.3 
keV to 7.9 keV band as was used for the Tx-pipeline. The 
three spatial models (see above) were convolved with the 1- 



d EAM (Section 2.1.11 XMM point-spread function model 
before the fitting took place. They were then multiplied by 
the exposure map at the respective location, in order to add 
observational effects such as vignetting and chip gaps. The 
background was accounted for as described above. 

The maximum-likelihood Cash statistic was used for the 
comparison between the model and the data. The MINUIT 
package ( | James fc Roos||1975[ ) of minimisation algorithms 
was used to find the best fitting of the three models. The 
best-fitting model was then used to calculate the scaling of 
the luminosity from the spectral extraction region to Rsoo- 
This was achieved by calculating the ratio of the summed 
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Figure 20. Schematic of how the luminosity uncertainties are 
calculated by combining the uncertainties on the two quantities 
(P1,P2). The cross represents the best-fit point and the dotted 
line represents the (unknown) l-cr confidence contour. The model 
luminosity is evaluated at A, B, C and D, and the maximum and 
minimum values are used as upper and lower uncertainty bounds. 




literature 



emission from the spectral extraction region (i.e. e/) to that 
from a circular region, radius -Rsoo- The ratio was then used 
to scale Lx,a P to Lx'soo- This luminosity scaling value is 
typically in the range 0.9 to 3.0, depending on the complex 
interplay between the cluster size and redshift, the location 
on the field of view, and the depth of the exposure. 

The 1-ct uncertainty bounds on the free parameters in 
the spatial fits were generated in a similar way to that used 



in the Tx-pipeline (Section 4.2.21, i.e. by stepping, and fix- 



ing, the parameter of interest. This was done separately for 
each of the three models used in the spatial fitting. The un- 
certainty bounds on the Tx.soo were not so straightforward 
to calculate, however. This is because the ix*5oo calculation 
involves both the Tx-pipeline and the Lx-pipeline and, since 
they are performed separately, there is no information on 
the correlations between them. Ideally one would carry out 
a simultaneous fitting of a spectral and spatial model to a 



data cube (X,Y and energy), as was demonstrated by Lloyd- 
Davies et al. (20001, but this process would be too com- 



plex and CPU intensive for the batch processing required 
by XCS. Therefore, we adopt the conservative approach of 
taking the uncertainty bounds for the two quantities (i.e. 
on the luminosity scaling value and on Lx,a P ) and calculat- 
ing the luminosities for the four most extreme combinations 



(Fig. 20 1. The highest and lowest luminosities are then used 



as the uncertainty bounds of the Txsoo measurement, al- 
though this will almost certainly be an overestimate. 

We note that not all of the candidates that are passed to 
the spatial fitting generate acceptable fits. When the spatial 
fitting fails for a candidate, we then estimate the luminosity, 
Lx* 5 oo by extrapolating the Lx,a P value assuming a standard 
cluster profile. For this, we fix the power law slope to be 
ft — 2/3 and use a core radius appropriate to the Tx. Thus 
all candidates that pass the spectroscopic pipeline will have 
either an associated Tx.soo measurement or Tx.'soo estimate. 



Figure 21. Comparison of the XCS determined outer slope (or 
/3) with that derived by other authors. In both cases, the sur- 
face brightness of the clusters was fit using a circularly-symmetric 
King profile (without a central cusp, see Equation [2] . For details 
of the four clusters used in this plot, see Table [C4] The solid line 
shows the one-to-one relationship. The error bars are 1-<t. The 
x-errors are as quoted in the literature. The y-errors come from 
the XCS fitting software (see Section |4. 3. 2| for details). 



4.3.3 Ly 



oeline Validation 



The Lx-pipeline is fully automated and so it is important to 
check the reliability of the results it produces before using 
them for scientific studies (such as the study of the evolution 
of the Lx— Tx relation). We have done this by comparing the 
XCS derived results for ft and Txsoo with those derived by 
other authors. First we have examined clusters in common 
with the sample of |Alshino et al.| ( |2010[ ) . This should be a fair 
comparison since this sample is a subset of the XMM-LSS 
( |Pacaud et al.|20 07) and is therefore, like XCS, drawn from 
an XMM survey. Fig. [21] shows XCS ft for 4 clusters plotted 



against the ft values taken from Alshino et al. (20101. It can 



be seen that there is no systematic bias between the values. 
In addition, the scatter about the one-to-one relation (solid 
line) is consistent with the measurement uncertainties. We 
can therefore be reassured that the spatial parameters we 
obtain from the spatial pipeline are reliable. 

We have also compared the XCS Tx.soo values against 
published values obtained from clusters with 300 or more 
counts in the XMM-LSS sample |Pacaud et~aL] ( |2007[ note 
that this is 300 XMM-LSS, not Xapa, cou nts). The XCS 



J-rt 



values are plotted against those of 



Pacaud et al. 



([2007]) in Fig. |22] It can be seen that they closely folio 
the one-to-one relation (solid line). This test demonstrates 
that the XCS Tx^soo values are reliable and hence suitable 
for science applications without the need for a further hands 
on analysis stage. 
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Figure 22. Comparison of XCS determined bolometric luminosi- 
ties within -R500 with values determined by other authors. For de- 
tails of the six clusters used in this plot, see Table [C5] The solid 
line shows the one-to-one relationship. The error bars are l-cr. 
The x-errors are as quoted in the literature. The y-errors come 
from the XCS fitting software (see Section |4. 3. 2| for details). 



4.4 Spectral Fitting: X-ray redshifts 

We cannot exploit the thousands of candidates that Xapa 
has produced without first determining their redshifts. As 
mentioned above (Section 



4.1 1, only a small fraction have 



redshifts available from the literature, so we have carried out 
both an intensive optical follow-up campaign, and exploited 
the SDSS archive, to gather more redshifts. This effort has 
yielded redshift information for ~ 900 additional candidates 
to date (Mil), but the redshift follow-up of the XCS is still 
far from complete. We therefore decided to use the XMM 
data itself to constrain candidate redshifts. This process, of 
measuring 'X-ray redshifts' or zx, has been demonstrated 
by several authors for individual clusters ( |Hashimoto et al. 
2u04l |Werner et aT1[2007l |Lamer et a!T]|2008b| |Rosati et al 



2009[) and recently on a sample of Chandra clusters by 



Yu 



et al. I ( |2011[ ), and has even been used to study bulk mo- 



tions of the gas within the bright, nearby clusters (Dupke & 



Bregman 2001b ah, but has never been used on the indus- 



trial scale we need for XCS. In the following we describe the 
X-ray redshift pipeline, zx-pipeline hereafter, and its verifi- 
cation using XCS clusters with known redshifts. 



4-4-1 Generating and Fitting the Spectra 



Similar to the Tx-pipeline (Section 4.2|, all exposures that 



overlapped with a particular candidate were used and a si- 
multaneous fit was carried across all the respective spectra. 
Because this pipeline will be run on the many thousands 
of candidates that Xapa produces, we needed to keep the 
processing time per candidate to a minimum. We therefore 
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Figure 23. Measured X-ray redshifts plotted against optically 
determined redshifts for clusters in XCS-DR1 (Mil). The solid 
line shows the one-to-one relationship. Only X-ray redshifts with 
statistical uncertainties of 20 per cent or less are shown. The insert 
shows a histogram of the difference between the X-ray redshifts 
and optically determined redshifts. 



chose a single-temperature MEKAL model, convolved with 
a photoelectric absorption model. Moreover, during the fit- 
ting, only the spectral normalisation was left free. By design, 
we do not want to assume the redshift, so we ran a series of 
fits stepping from z = 0.01 to z = 2, in steps of 0.01. At all 
of these steps, the metallicity was fixed at Z=0.3x Solar and 



7ih at the Dickey & Lockman ( 1990[ ) value. The Tx was not 
free either, but rather calculated (via the Arnaud & Evrard 
1999 Lx — Tx relation) from the best-fit normalisation at 



that redshift step (assuming no scatter in the Lx — Tx rela- 
tion). 

At each redshift step, the Cash statistic was recorded, 
as demonstrated in Fig. |A5| The zx for the candidates is 
then chosen by searching for minima in the distribution of 
Cash statistic values. Usually the redshift corresponding to 
the lowest Cash statistic was used, but if the corresponding 
temperature was Tx > 8 keV then the next deepest mini- 
mum was chosen, and so on. This limit was placed on the 
allowed temperature because very few Tx > 8 keV are ex- 
pected to be detected by XCS ( |Sahlen et al.|2009r) . The 1-er 
uncertainty on zx, o"z x > was a l so determined from the Cash 
statistic distribution. We note that in the following we refer 
to statistical uncertainties expressed as a percentage, and by 
this we mean 100 x u zx /zx- 



4-4-2 zx- Pipeline Validation 

X-ray redshift measurements have not been attempted with 
this level of automation before. We checked our results to 
see if our zx values are suitable for science applications. We 
did this using clusters in XCS-DR1 that had optically de- 
termined redshifts Under the assumption that the optical 
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redshift is correct, we have compared the zx to the Ztrue for 
42 XCS-DR1 clusters where the zx fit yields a statistical 
uncertainty of < 20% and a zx < 0.05 (Fig. [231. We chose 



a lower minimum acceptable source significance (4-cr rather 
than the 8-cr used in R01 1 - because the Xapa extent de- 



cr Zx = 0.05 as the upper limit for this comparison because 
this is the typical error on the single colour photometric red- 
shifts presented in Mil. As shown in Fig. |23| the zx fits are 
usually good (to within the errors), but the level of catas- 
trophic failures, i.e. where Ztruc — zx = A z 3> o"z x > is high. 
The failure rate for X-ray redshifts is 24 per cent, compared 
to < 7 per cent for the photometric redshifts presented in 
Mil. 

These catastrophic failures are not unexpected, even 
for high signal-to-noise spectra, when the single-temperature 
spectral model is too simple, e.g. if there is AGN contami- 
nation or a cool core. Similarly, if the riH and/or abundance 
was fixed at the wrong value, or the cluster is not close 
enough to the Arnaud & Evrard ( 1999 1 Lx — Tx relation, 



then one might obtain zx values with small errors, but that 
are not physically realistic. We have investigated the possi- 
bility of making cuts on the sample to objectively weed out 
the catastrophic failures. However, it does not seem to be 
possible to predict, a priori, that a given zx estimate would 
be unreliable from any combination of number of counts, 
cluster temperature or nu- We discuss in Section [5] how we 
plan to use the zx values, despite their high failure rate. 



5 DISCUSSION 

In Fig. [I] we introduced the complex methodology associ- 
ated with the generation of a cluster catalogue based on 
serendipitous detections in the XMM archive. We went on 
to describe, and verify, all the steps in the methodology that 
involve XMM data (other steps are described in our com- 
panion paper, Mil). In this section we will discuss each 
step again, making reference where appropriate to predic- 



tions made in our pre-launch paper Romer et al. (2001 R01 



hereafter), and in our cosmology forecasting paper (Sahlen 
et al. 2009). We also highlight areas for improvement 



In Section [2.21 we described the download of data from 
the archive and showed how the area covered by the archive 
has grown over the last 10 years. By now there are over 600 
deg 2 of the sky covered by XMM, and 51 deg 2 , 276 deg 2 
and 410 deg 2 (at > 40 ks, > 10 ks and > ks depths re- 
spectively) are in regions suitable for cluster searching. We 
note that the exposure times used in Fig. [2] are after flare 
correction (Section 2.3.31, and that flares typically affect 
one-third of the exposure. The rate of addition of new area 
is slowing over time, reflecting the trend towards repeated 
observations and fewer, but longer, exposures. It is, there- 
fore, almost certain that XCS will not reach the 800 deg 2 
The revised target of 500 deg 2 target set 



target set in 
in 



R01 



Sahlen et al. ( 2009 I does remain achievable though (as 



long as no minimum exposure time cut is applied). 

The distribution and average (requested) exposure 
times of ObsIDs in the public archive is close to what was 
anticipated in |R01| but due to the unanticipated need for 
flare correction, the average usable exposure is only 13 ks 
(compared to a requested average of 20 ks and a predicted, 
in R01 average of 22 ks). These decreases, in exposure time 



termination is more effective at low signal-to-noise than ex- 
pected from previous experience with ROSAT - and this 
will help to keep the cluster numbers up. 

In Sections 12.31 and 12.41 we described the reduction of 
the downloaded data, including mitigation of time periods 
affected by background flares, and the production of im- 
ages. This was done in a fairly standard way, albeit on a 
much larger, and more automated, scale than is typical. In 
|R01| we expected that XCS source detection would be car- 
ried out only in EPIC-pn images, because we assumed that 
it would be too complicated to carry out selection func- 
tion tests on merged images. However, in practice we have 
been able to run source searching and selection functions on 
merged images without any difficulty (Section 3.4.31. This 
has helped compensate for the decreased sensitivity, and in- 
creased background levels, of the EPIC-pn CCDs compared 
to pre-launch predictions. One thing that was not antici- 
pated in |R01| was the need to create mask files by hand for 
about a third of the ObsIDs (Section 2.4.11. This tedious 
process has been carried out by student volunteers and has 
not actually held up the processing of the archive signifi- 
cantly. 

Overall, we are satisfied by the performance of the pro- 
cedures described in Section [2] and do not plan any major 
modifications in future. That said, we did uncover during 
the quality control stage that some of the masks were too 
small and also that a small fraction of the reduced image had 
an atypically high background (see Mil). These two factors 
have resulted in contamination in the candidate list at the 
~ 7 per cent level. To avoid such contamination in future, 
we have improved the way that eye-ball checks of reduced 
images will be carried out. 

In Section [3] we described the generation of the XCS 
source catalogue using the XCS Automated Pipeline Algo- 
rithm (Xapa), and the tests we have carried out to demon- 
strate its efficacy. In Section |3.1| we explained how Xapa 
applies multi-scale wavelets to generate a source list per Ob- 
sID, and discussed some of the successes of Xapa, including 
the ability to detect sources over a wide range of sizes and 
signal to noise: only very rarely does one look at an im- 
age of an ObsID, with Xapa ellipses overlaid, and see real 
sources that have been missed or artefacts (e.g. chip edges, 
where there can be discontinuities in the background level) 
misidentified as sources. We are especially pleased with the 



and areal coverage, will certainly impact the size of the final 
XCS cluster catalogue. However, we have been able to use 



Xapa vision model (Section 3.1.21 because of its ability to 
detect sources within sources and to fit source ellipses. Dur- 
ing the development of Xapa, it was found that vagaries of 
the XMM optics could result in false source detections (e.g. 
when point sources had extended lobes, due to the complex 
off-axis PSF), or incorrect size measurements (e.g. when an 
extended source had a cuspy core). These issues were ad- 
dressed with additional sub-algorithms. 

As shown in Section |3.3[ the parameterisation of point 
sources (fluxes, positions etc.) is very good. Not only does 
this give us confidence that the extended source centroids 
are suitable for cluster searching, it also demonstrates that 
the point source catalogue itself can be used for science ap- 
plications. XCS members and collaborators are using the 
data products in the point source catalogue in a variety of 
ways, including a study of the evolution of quasar X-ray 
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spectra and a search for X-ray Dim Isolated Neutron Stars 



(or XDINS, see Haberl||2007 for a review of XDINS) 



We do have some concerns, however, about the pa- 
rameterisation of extent by Xapa (Section |3.2.3[ ) because 
the available PSF models are known to have deficiencies, 
especially off-axis. Occasionally, during the quality control 
stage (Mil), we see sources that are obviously (from the 
X-ray data themselves and/or from the related optical im- 
age) point-like but that have been classified as extended 
and erroneously entered into the candidate list. Likewise 
there are likely to be incidences of extended sources that 
are detected but falsely classified as point-like (or flagged as 
PSF- like). The latter effect was indicated by the selection 
function test using numerically generated synthetic clusters 
(Section |3.4.5 see below). For these reasons we plan to adapt 
Xapa to use a new 2-d PSF model that is currently un- 



der development by Read et al. (2010). The implementa- 



tion of this new 2-d PSF model would be a major under- 
taking because it would require the extent determination 
sub-algorithm of Xapa to be rewritten and also necessitate 
the recalculation of the survey selection functions. It is also 
worth pointing out that even a perfect PSF model cannot 
prevent very nearby (on the sky) sources from becoming 
blended into a single source, especially when the signal-to- 
noise is low. These blends will always affect our candidate 
list at some level (as they will any cluster searching project 
based on XMM detections); some will be obvious from the 
optical follow-up (see Mil for examples), but some might 
well require higher resolution imaging, e.g. from Chandra, 
to be identified. 

The collation of a Master Detection List (MDL) for the 



survey (Section 3.2.71 has been fairly straightforward, de- 



spite the fact that so many ObsIDs overlap (and so many 
sources are detected multiple times in the archive). However, 
we have found that the process by which duplicate extended 
sources are identified (via a fixed matching radius of 30 arc- 
sec) does not always work at low redshifts (z<0.2), so we 
are in the process of improving this aspect of Xapa. As of 
May 1st 2010, the MDL contained 114,711 point sources and 
12,582 extended sources, although, as just noted, a small 
number of the extended sources will be duplicate entries. 
We have selected 3,675 of the extended sources as cluster 
candidates, after making a series of cuts to the extended 
source list, and these have then been passed onto the post 
processing and optical follow-up steps described in Section[4] 
and in Mil. 

We stress that the MDL, and hence the candidate list, 
was derived from the analysis of individual ObsIDs, even 
in regions where different ObsIDs overlap. In fact, approx- 
imately 40 per cent of ObsIDs in the XMM public archive 
have significant overlap with other ObsIDs, with a median 
additional exposure time of 70 percent. Therefore, it would 
be possible to increase the number of sources, and hence 
candidates, detected by Xapa using co-adding ObsIDs. How- 
ever, this would require a major overhaul of both Xapa and 
the selection function methodology (in the latter case be- 
cause the point spread function would be significantly more 
complicated), and we have no plans to use co-added Ob- 
sIDs in XCS. That said, we do take advantage of multiple 
exposure when running the Tx and Lx-pipelines. 

We note that the XCS is not the largest compilation 
of XMM detections; the XMM-Newton Serendipitous Sur- 



vey 2XMM catalogue ( | Watson et al.|2009| > contains 191,870 
sources discovered in 3,491 XMM ObsIDs. We make compar- 
isons between XCS and 2XMM point and extended sources 
in Sections |3.3| and |3. 4. 1| finding them to be in good agree- 
ment. We have compared XCS to another sample of XMM 



selected clusters (the XMM-LSS, Section |3.4.2[ ), and find 
them to be in good agreement also: only four of the 33 
XMM-LSS 'class 1' extended sources did not make it into 
the candidate list (because they did not meet the extent 
and/or signal-to- noise criteria). 

Selection functions are very important to any cluster 
survey that plans to carry out statistical studies, such as 
the measurement of scaling relations or cosmological param- 
eters. The XCS selection functions need to describe survey 
completeness as a function of a wide number of parameters, 
and are thus very CPU intensive. Examples of our selection 
function work so far are given in Sections [3X4] and [3X5] We 
have demonstrated, using simple analytical models for the 
ICM distribution, that XCS can detect typical (for the local 
Lx — Tx relation), 3 and 6 keV clusters to high redshifts, 
but that the percentage recovery of the cooler (i.e. fainter) 
clusters drops off rapidly, e.g. from roughly 60% at z = 0.6 



to 10% atz = 0.9 for 3 keV clusters (Fig. [A3) . These pre- 
dictions of the selection function redshift dependence were 
based on the assumption that all clusters have core radii of 
r c — 160 kpc, so we have also investigated our sensitivity 



to smaller and larger clusters (Fig. 141. We found that for 



clusters with more than 300 counts, the cluster recovery rate 
is good (>70%) when the extent is in the range ~ 10 — 20 
arcsec. These limits roughly translate to 0.1 < z < 0.6 for 
r c = 50 kpc and z > 0.3 for, more typical, r c = 160 kpc. 
XCS is not as sensitive to clusters with core radii at the top 
end of the Jones & Forman (1984) range; roughly only 20% 
[40%] of 300-count clusters with r c = 400 kpc are recovered 
at z > 0.3 \z > 1], although this rises to 60% [75%] for 1000- 
count clusters. This insensitivity was not anticipated in |R01| 
(i.e. before we had access to realistic selection functions); we 
claimed therein that all clusters with core radii larger than 
20 arcsec would be flagged as extended sources. It may seem 
counter intuitive that clusters of larger angular extent are 
harder to recover, but this is due to two factors; first, more 
extended clusters have lower surface brightnesses and cor- 
respondingly lower contrast against the background, mak- 
ing them harder to detect, and second, our wavelet scales 



were chosen with more compact clusters (< 250kpc, R01 1 in 
mind. 

In Section |3.4.5| we have used numerically-generated 
'clusters', from the CLEF hydrodynamical simulation of |Kay| 
|et al.| ( [2~007) , to investigate whether factors such as cool cores 
(which result in luminosity enhancements at small radii) , or 
recent merging activity, might impact the ability of XCS to 
detect clusters. We found that the numerical 'clusters' are 
easier to detect than the analytical /3-profile 'clusters', but 
that they are more likely to be misclassified (as point-like) 
when they contain cool cores. This effect is reduced as the 
number of 'source' counts increases, and above 500 counts no 



longer occurs (Fig. 15 1. This test justifies the use of selection 



functions based on simple analytical cluster profiles in the 
XCS cosmology forecasting paper ( Sahlen et al.|2009 which 
was based on a minimum count threshold of 500). This is 
important because CLEF, and most other hydrodynamical 
samples, are only available for a single assumed underlying 
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cosmology and, in order to use XCS to measure the underly- 
ing cosmology, we need to know the selection function across 
a range of cosmological parameters. However, this test fur- 
ther suggests that it may not be appropriate to use only 
simple analytical profiles when establishing selection func- 
tions for a minimum count threshold of 300 (which we have 
determined is the limit to which we can expect to recover 
reliable Tx measurements, see below), and this is something 
we plan to investigate further. 

In Section |4.1[ we described an automated search for 
redshifts available in the literature. When this search was 
run in May 2011, a total of 493 candidates were associated 
with published redshifts using NED. We have found, how- 
ever, during the preparation of the first XCS data release 
(XCS-DR1, Mil) that the NED redshifts are not always ap- 
propriate for their respective candidate: the match to a given 
NED cluster might be erroneous, especially at low redshifts 
(where the allowed matching radius is large). We have there- 
fore imposed a defauhj^] minimum redshift limit of z = 0.08 
when using literature redshifts. Moreover, even if the match 
to the NED cluster is correct, the default NED redshift for 
that cluster might not be the best one available in the liter- 
ature. Therefore, of the 493 redshifts automatically selected 
from NED, only 127 were included in XCS-DR1. That said, 
NED still contributed more redshifts to the catalogue than 
any of the other optical follow-up methods used in Mil. 

In Section f4. 21 we have described and verified an auto- 
mated method to derive X-ray temperatures from the XMM 
archive. We have shown that reliable Tx values can be ob- 
tained for most clusters if more than 300 soft-band counts 
are available in the background-subtracted spectrum (Fig- 
ures [16] and Wf\ . We have further shown that our technique 



fully for XMMXCS J2215. 9-1738, see Hilton et al. 20101 



works well even at large off-axis angles (Fig. 18 1 and that our 



automatically generated results are consistent with those de- 
rived by other authors using more traditional spectral fitting 



methods (Fig. 19 1. We note that being able to fit Tx down 
to 300 counts was not anticipated in |R01| where we assumed 
the minimum counts threshold for Tx measurement would 
be 1000. This decrease can be attributed to our adaptive 
spectral binning technique and our use of Cash (rather than 
Gaussian) statistics in the fitting. 

In |R01| we predicted that up to 1,800 XMM clusters 
might yield temperatures (with < 20 per cent errors). By 
comparison (by May 1st 2011), we had made only 292 Tx 
measurements (with < 20 per cent errors) for candidates 
with optically determined redshifts, although, when the er- 
ror threshold is relaxed to < 40 per cent, the number rises 
to 587. Of these 587, 357 were determined from candidates 
detected with 300 or more background-subtracted counts 



(Fig. 241. Even when we set the error threshold at 10% (the 
calibration uncertainty for the satellite), we still have 122 
clusters (112 with over 300 counts) remaining. For these 
122, it would not be worth doing further XMM follow-up, 
although some high resolution Chandra imaging would be 
worthwhile to elucidate the impact of point source contam- 
ination on derived temperatures (as we have done success- 



4 In principle we would have been prepared to assign zm < 0.08 
values to XCS-DR1 clusters if they had a measured photometric 
redshift of z p hot 5: 0.1. However, in practice there were no such 
cases, see Mil. 



Errors on Tx of 40% are too large for some of the science 
applications we have in mind for XCS, e.g. studies of the 
evolution in the scatter on the Lx — Tx relation (since the in- 
trinsic scatter is < 40%). Therefore, we have made requests 
for additional XMM follow-up of certain XCS clusters. We 
note that only 278 of the 587 candidates with Tx measure- 
ments (< 40 per cent errors) are included in XCS-DR]|^] 
That is not to say that the remainder are not clusters, but 
rather that they cannot be confirmed as being so using the 
currently available optical and X-ray data (see Mil). Even 
so, the size of the XCS-DR1 Tx sample is still much larger 
than any previous published compilations of cluster Tx mea- 
surements (with either heterogeneous or homogeneous selec- 
tion) . 

In Section [4.31 we described and verified an automated 
method to derive X-ray luminosities from the XMM archive, 
the Lx-pipeline. This pipeline is run on any candidate for 
which a Tk measurement has been made via the Tx-pipeline. 
We demonstrated that the parameters that come out of the 
spatial fitting are robust, as compared to previously pub- 
lished work (Figures |2 1 1 and [22] ) . Limitations of the current 
method include the reliance on circularly-symmetric mod- 
els and the lack of covariance information between the Tx- 
pipeline and Lx-pipelines. Addressing these two issues is 
possible, but given that we are often fitting to only a few 
hundred counts, and currently using a circularly-symmetric 
PSF, we have no plans to adjust the Lx-pipeline accordingly 
(because it would increase the computational complexity sig- 
nificantly). A further limitation of the current Lx-pipeline 
is that the error on the input redshift is assumed to be zero. 
This simplification is justified for spectroscopically deter- 
mined redshifts, but not for photometric, or X-ray (see be- 
low), redshifts. This issue will be addressed before the Lx 
values are used in a future study of the evolution of the 
Lx — Tx relation. 

The impact of redshift errors notwithstanding, the un- 
certainty on the Lx value for a given candidate will be much 
smaller than the associated error on Tx (because the ICM 
emission is only a weak function of Tx). Therefore, we do 
not think it is necessary to carry out a large-scale XMM 
follow-up campaign in order to improve the Lx measure- 
ments. That said, we are planning to request XMM snap- 
shots of clusters that were discovered so close to the edge of 
the field of view that a large fraction of their flux was not 
captured in their respective EPIC images. These snapshots 
will allow us to get a better estimate of their total flux. We 
also plan to make Chandra snapshot requests of a represen- 
tative subsample of XCS clusters, in order to gauge, in a 
statistical sense, the impact of point source contamination 
on XCS Lx values (although this test may be possible us- 
ing the existing Chandra archive and we will explore that 
avenue first). 

In Section [4. 4| we described a method to extract X-ray 
redshifts directly from the discovery data to supplement the 
XCS optical follow efforts. As shown in Fig. |23| acceptable 
(Az < 0.1) redshift measurements are made in ~ 75 per 
cent of the cases when thresholds on the zx-pipeline errors 



5 An additional 124 more Tx measurements with larger errors 
are also included in XCS-DR1 
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Figure 24. Number of clusters with less than 40 per cent temperature errors and with more than 300 soft-band counts (blue) and with 
no count cut (green) plotted against measured temperature (Left panel) and fractional temperature error (Right panel). 



are set at < 20% and er zx < 0.05. To date, 453 candidates 
have yielded zx measurements that meet these criteria. We 
have used zx estimates to preselect candidates for optical 
follow-up and this approach has been successful, e.g. one 
cluster with zx = 0.84 was demonstrated to have a true 
redshift of z = 0.83 based on subsequent Gemini GMOS 
spectroscopic observations (Mil). The level of catastrophic 
redshift errors is much higher for X-ray redshifts than for 
optical photometric techniques, so all clusters with only zx 
values will ultimately have to be followed up with optical 
photometry or spectroscopy. 

The impact of zx errors on Lx measurements is signif- 
icant, and so Lx values that rely on zx will not be used for 
science applications. However, we have determined that the 
impact of zx errors on Tx measurements is not significant: as 
shown in Fig. |25| an absolute redshift uncertainty of A z =0.3 
induces a less than 30 per cent Tx uncertainty. For this rea- 
son, it will be possible to use zx determined Tx values to 
select XCS clusters for SZ follow-up, because the SZ effect 
is (roughly speaking) redshift independent. Most current SZ 
instruments are only sensitive to Tx > 5 keV clusters, but 
few of those have been catalogued yet, especially at z > 0.5: 
in the BAXQ database there are only 39 such clusters listed. 
In XCS-DR1 there are 31 such clusters (of which 25 are 
in addition to the BAX sample). By comparison, using the 
X-ray redshift technique we have identified 17 more can- 
didates (without other redshift information) meeting those 
criteria. In summary, X-ray redshifts are not a 'magic bullet' 
and optical follow-up is still required in order to secure red- 
shift measurements; however, they do provide some useful 
information, as long as they are used judiciously. 

In summary, we have demonstrated that the X-ray al- 
gorithms developed for XCS are suitable for the compilation 
and analysis of large samples of clusters detected serendipi- 
tously by XMM. In our companion paper (Mil) we discuss 
the optical follow-up of candidates and present the first XCS 
data release (XCS-DR1), including 402 Tx measurements. 
On going science exploitation of XCS-DR1 includes projects 
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Figure 25. The results of fitting simulated MEKAL spectra with 
a temperature of 3.0 keV and redshifts of 0.2 (solid), 0.6 (dashed) 
and 1.0 (dotted) showing fractional temperature uncertainty as a 
function of absolute redshift uncertainty. 

related to cluster scaling relations, fossil groups, the SZ effect 
and the derivation of cosmological parameters. We also plan 
to apply our post-processing pipelines, that were designed 
with serendipitous clusters in mind, to the many hundreds 
of target clusters in the XMM archive, so that they too can 
benefit from a uniform set of Tx and Lx measurements. Even 
though these target clusters cannot be used for XCS statis- 
tical studies, we think this will be a valuable resource for the 
community, especially now that Planck is in full operation. 



6 CONCLUSIONS 

(i) We have demonstrated that the XMM archive is a rich 
resource for serendipitous cluster detection out to redshifts 
of at least z — 1.5. 

(ii) The archive now covers over 600 square degrees that 



© 2010 RAS, MNRAS 000,[l}j45] 



28 E. J. Lloyd-Davies et al. 



can be used for serendipitous source detection and, of this, 
51 deg 2 , 276 deg 2 and 410 deg 2 (at >40 ks, >10 ks and >0 ks 
depths respectively) are available for cluster detection. 

(iii) We have shown that typically one-third of a given 
XMM exposure is rendered unusuable due to background 
flares. 

(iv) We have shown that it is possible to exploit the whole 
XMM archive in a uniform and reproducible way. 

(v) We have developed a source detection pipeline that 
operates across the entire XMM field of view, and is effective 
over a wide range of angular scales and signal-to- noises. It 
has many features, including the ability to determine which 
sources are extended beyond the PSF model and to detect 
point-like sources that lie along the line of sight to extended 
sources. 

(vi) We have developed a pipeline that can measure re- 
liable X-ray cluster temperatures. This pipeline has been 
shown to work well even when the cluster is discovered on 
the outskirts of the field of view. We have demonstrated 
that with 300 or more background-subtracted counts, one 
can measure robust, unbiased, temperatures for most clus- 
ters. 

(vii) We have developed a pipeline that can measure reli- 
able X-ray luminosities by making spatial fits to XMM im- 
ages. The derived luminosity values have been shown to be 
robust, as have the fitted spatial parameters. 

(viii) We have developed a pipeline that can measure 'X- 
ray redshifts' for clusters using XMM spectra. These red- 
shifts can help increase the number of clusters with X-ray 
temperature measurements; acceptable (A z < 0.1) redshift 
measurements are made in ~ 75 per cent of the cases (once 
errors thresholds have been imposed). 

(ix) To date (May 1st 2011), some key statistics for XCS 
are as follows: 5,776 ObsIDs have been downloaded from the 
XMM archive; 5,642 ObsIDs have run through the event list 
cleaning pipeline; 4,029 ObsIDs have been processed by the 
source detection pipeline; 114,711 point sources and 12,582 
extended sources have been catalogued; 3,675 cluster can- 
didates have been selected, of which 993 were detected with 
more than 300 background-subtracted counts; 587 (122) X- 
ray temperatures have been measured with < 40 (< 10) per 
cent errors. 
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APPENDIX A: ADDITIONAL SUPPORTING FIGURES 




Figure Al. Exposure maps relating to differing EPIC observing modes. In order from top left: MOS1 full window mode, MOS2 full 
window mode, MOS fast uncompressed, MOS partial window W2 or W4 mode, MOS partial window W3 or W5 mode, MOS1 full window 
mode with CCD6 switched off, pn full window mode, pn large window mode. 
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Detection Efficiencies: L w =0.178 x 10" erg s ', T=3 keV 
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Figure A3. Predicted recovery efficiency as a function of redshift for 3 keV clusters with a range of X-ray luminosities (bolometric). 
The synthetic clusters used for this test had circularly symmetric /3-profiles (/3 = 2/3) with core radii of 160 kpc. The typical luminosity 
of a 3 keV cluster based on the local Lx-Tx relation is 1 to 2xl0 44 erg s — 
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Figure A4. Predicted recovery efficiency as a function of redshift for 6 keV clusters with a range of X-ray luminosities (bolometric). 
The synthetic clusters used for this test had circularly symmetry /3-profiles (/3 = 2/3) with core radii of 160 kpc. The typical luminosity 
of a 6 keV cluster based on the local Lx~Tx relation is 8 to 15xl0 44 erg s . 
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Redshift 



Figure A5. Cash statistic output from the X-ray redshift fitting code, plotted against redshift. We show 12 XCS clusters that have 
both well measured (< 2.5% statistical uncertainty) X-ray redshifts and spectroscopically-determined optical redshifts. The optical and 
X-ray redshifts are indicated with red and green dotted lines respectively. 
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APPENDIX B: XAPA FLOWCHARTS 
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Figure Bl. Flowchart of the process of XMM data reduction to produce cleaned event files, images and exposure maps. This overviews 
the process by which the XMM data is acquired, reduced, and cleaned, and the products generated. 
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Make lightcurve 



Calculate Mean and 
Standard Deviation 



Calculate Count 
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Make Good Time 
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Filter Events 



L 



Clean Events 
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Figure B2. Flowchart of the process for removing periods of high background due to variations in the particle flux to which the 
instruments are exposed. This illustrates the process by which lightcurves of the raw event files have an iterative 3-cr clipping applied to 
them until the mean count-rate stops changing. Cleaned event files are then produced for the good time intervals identified. 
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Figure B3. Flowchart for the md.detect routine, showing the two-stage (wavelet transform and reconstruction), two-pass (to 
pollution of the wavelet signal by bright, compact sources) process. 
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Figure B4. Flowchart for the md_recon routine, showing the stages of the process to reconstruct a source list from the outputs of 
wtransform on different scales. 
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Figure B5. Flowchart for the f ind_srcprop routine, showing the stages of the process to derive the properties of each source. The 
routine is run in two stages; the first niters out sources with < 4-cr significance, and the second determines the extent probability. 
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(_ End ) 

Figure B6. Flowchart for the f ind_srcprop_f inal routine. This routine measures properties for every source that is output from 
f ind_srcprop. 
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Figure B7. Flowchart of process for deriving luminosities and temperature from cluster spectra and images. This illustrates the process 
by which models are fitted to the X-ray spectra and images to produce temperatures and aperture luminosities, which are corrected 
using the fitted surface-brightness model to produce luminosities within Bsoo- 



L 



Source Region 



7 L 



Background Region 



~J j Image J 



Exposue Map 



Build Particle Test Regions 



Calculate Counts in Regions ^ 



^ Calculate Background Count Rate 



^ Calculate Particle Fraction ^ 



Calculate Background Map 
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7 
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Figure B8. Flowchart of process for generating background maps for use in the XCS surface brightness fitting. This illustrates the 
process by which the background measured in an annulus around the source is extrapolated to all positions in the image, taking into 
account the exposure map and the fraction of particles in the background that are not vignetted by the telescope. 
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APPENDIX C: CLUSTERS USED FOR T x AND L x PIPELINE VALIDATION 



Cluster Name z n H T| ° Tf° T-f rf yiooo T 2000 

(10 20 cm" 2 ) (keV) (keV) (keV) (keV) (keV) (keV) 

XMMXCS J001737.4-005235.4 0.20 2.72 6.50jl|;|| 4.82+ 2 ;^ 5.36+^'^ 4.94+^|| 5.24±°;!J 4.57±°'g° 

XMMXCS J092018.9+370617.7 0.19 1.57 2.13±g;^ 2.51±°;^ 2.07±g;|| 2.44±°;g 2.28±°;g 2.48±9j^ 

XMMXCS J130749. 6+292549. 2 0.25 1.01 2.54^1^1 2.93±° 73 2.92t°'^ 2 2.94±°'j? J 3.17±q;|| 3.051^35 

XMMXCS J141832. 3+251104. 9 0.30 1.84 - - 6.811-5? 7.14lHt 7.79±,?2 6.301, ?I 



-2.21 ''^ -2.45 ' - ,i7 -2.56 U - JU -1.18 



Table CI. Clusters used for the comparison of temperature measurements with different numbers of soft- band source counts per 
spectrum. 
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Cluster Name z njj On-axis Tx Off-axis Tx 

(10 20 cm- 2 ) (keV) (keV) 

+4.37 
-2.16 
+0.47 
-0.46 
+0.02 
-0.03 
+4.36 
-1.54 
+ 1.34 
-0.90 
+ 0.44 
-0.19 
+ 5.92 
-1.71 
+ 14.03 
-5.82 

Table C2. Clusters used for the on/off-axis comparison of temperature measurements. 



XMMXCS J100304.6+325337.9 


0.42 


1.55 




4.40; 


XMMXCS J151618.6+000531.3 


0.13 


4.66 


5.68«; 22 


5.11 


XMMXCS J184718.3-631959.3 


0.02 


6.87 


n 70+O.OI 

U -' 8 -0.02 


0.81" 


XMMXCS J130832.6+534213.8 


0.33 


1.58 


3 66+ ' 70 
l5 - DD -0.56 


4.45; 


XMMXCS J072054.3+710900.5 


0.23 


3.88 


2 99+ 1 - 48 
-0.92 


2.93 


XMMXCS J132508.7+655027.9 


0.18 


2.00 


1 01+"' 2 " 
J - ul -0.19 


0.71 


XMMXCS J022403.8-041333.4 


1.05 


2.51 


n yn+0.89 

°- ' -0.67 


4.07; 


XMMXCS J223520.4-255742.1 


1.39 


1.47 


9 45 +3.i9 

3 -^ d -2.44 


11.29 



Cluster Name 



"II 



(10 2 



Tnlit 

(keV) 



yXCS 
(keV) 



XMMU J131359.7-162735 





.28 


4 


.92 


o c-7+0.12 
• 3 - ot -0.12 


3. 


2XMM J100451.6+411627 





.82 





.89 


, 2 +0.4 
^■ z -0.4 


5. 


XLSS 


J022045.4-032558 





.33 


2 


.49 


i 7+0.3 
1,1 -0.2 


2. 


XLSS 


J022145. 2-034617 


0. 


,13 


2 


.52 


4 8+ ' 6 
^•°-0.5 


5. 


XLSS 


J022404.1-041330 


1 


.05 


2 


.51 


4 1+0-9 


3. 


XLSS 


J022433.8-041405 





.26 


2 


.46 


i q+0-2 
1- -Q.1 


1. 


XLSS 


J022457.1-034856 


0. 


.61 


2 


.49 


q n + 0.4 
°- -0.3 


3 


XLSS 


J022524.7-044039 





.26 


2 


,19 


2 0+ - 2 
z - u -0.2 


2. 


XLSS 


J022530.6-041420 


0. 


.14 


2 


.35 


1 34+0.14 


1. 


XLSS 


J022540.6-031121 





.14 


2 


.66 


q c+0.6 


4. 


XLSS 


J022616.3-023957 


0. 


.06 


2 


.67 


fiS+0- 03 
U - Dc> -0.03 


0. 


XLSS 


J022722.4-032144 





.33 


2 


.61 


2 4+0.5 
Z - -0.4 


2. 


XLSS 


J022739.9-045127 





.29 


2 


.59 


i 7 +0.l 
1 "'-0.1 


1. 


XLSS 


J022803.4-045103 





.29 


2 


.67 


9 0+O.6 
2 ' S -0.5 


2. 



5.36 



2.38 



+0.19 
-0.19 
+0.96 
0.78 
+0.92 
-0.51 
+ 1.83 
-1.20 
+0.51 
-0.38 
+0.19 
-0.24 
+ 1.23 
-0.85 
+0.99 
-0.53 
+0.12 
-0.13 
+0.52 
-0.51 
+0.04 
-0.04 
+ 1.02 
-0.64 
+0.33 
-0.18 
+2.13 
0.93 



Table C3. Clusters used for the literature comparison of temperature measurements. Redshifts and Tx values for the XLSS clusters come 
from |Pacaud et al.|j2007) , those for the XMMU and 2XMM clusters from |GastaldeIlo et al] | |2007| l and |Hoeft et a.1. | ( |2008| > respectively. 



Cluster Name z njj 

(10 20 cn 



jXC'S 



XLSS J022726.0-043216 0.31 2.67 0.61±°,:2i 

XLSS J022356.5-030558 0.30 2.63 0.43t° °g 0.43j;J5;£g 

XLSS J022616.3-023957 0.06 2.67 0.69+°-° 7 5 0.791°,;°;; 

XLSS J022045.4-032558 0.33 2.49 0.54±^;g| 0.55±°,;°4 



Table C4. Clusters used for the literature /3 comparison with the work of |Alshino et al."| l |2010| ) . 



Cluster Name z kh ^X 500 XCS ^xsoo 

(10 20 cm" 2 ) (10 43 erg s" 1 ) (10 43 erg's" 1 ) 



XLSS J022540.6-031121 0.14 2.66 °- 93 ±o.06 ^-l.l 

XLSS J022616.3-023957 0.06 2.67 0.025l° °° 2 0.019±o.oi? 

XLSS J022404.1-041330 1.05 2.52 4.83±q H 4.9l|; 2 

XLSS J022206.7-030314 0.49 2.52 2.89±g;i| 2 ^tll 

XLSS J022457.1-034856 0.61 2.49 3.32^° s -°tt.l 

XLSS J022045.4-032558 0.33 2.49 0.38tn'2? 0.35tn1n 



Table C5. Clusters used for the literature luminosity comparison with the work of |Pacaud et al.] ( 2007 
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